SWEA-Dataveg: A vegetation database for sub- Saharan Africa

SWEA-Dataveg is a vegetation-plot database collecting observations mainly in sub-Saharan Africa but also open to the rest of the African continent. To date this database contains more than 5,500 plot observations provided by 47 sources (projects, monographs, and articles). While the database is stored in PostgreSQL (including the PostGIS extension), the R-package “vegtable” implements a suitable exchange format. In this article we assess the current content of SWEA-Database and introduce its history and future as a repository of data for syntaxonomic assessments and macroecological research.


Introduction
In sub-Saharan Africa as elsewhere, documenting and classifying vegetation has become an urgent task to enable the proper assessment of endangered ecosystems (Jansen et al. 2016). With an increasing number of research projects dealing with vegetation ecology in the region, there is a vast amount of information of high scientific value that could be made accessible to the wider research community. At the same time, knowledge accumulated in past research programs can also provide the basis for constructive research into vegetation history, biogeography and conservation. Database structures such as vegetation-plot databases may serve as important repositories for data curation and ensure research repeatability and meta-analysis in the context of macroecological and biogeographical studies (Dengler et al. 2011;Bruelheide et al. 2019).
The database SWEA-Dataveg (Alvarez et al. 2012b) was initiated as a repository for ongoing projects in East Africa, specifically for the SWEA project (Agricultural use and vulnerability of small wetlands in East Africa). At its genesis the database was focusing on the collection of data from wetland ecosystems in Kenya and Tanzania (see Alvarez et al. 2012a). Through follow-up projects and collaboration activities with the ETH-Zürich (Switzerland) and the East African Herbarium (Kenya), the database was expanded to all vegetation formations and included data from additional African countries.
This report briefly displays the current status of the vegetation-plot database SWEA-Dataveg (GIVD AF-00-006) and its applications in the research of vegetation ecology and biogeography in sub-Saharan Africa.
Copyright Miguel Alvarez et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

LONG DATABASE REPORT ECOINFORMATICS
International Association for Vegetation Science (IAVS)

History
The idea of establishing a vegetation-plot database started during a visit to the 8 th Meeting on Vegetation Databases, held at the University of Greifswald, Germany, in 2009. The project was officially launched in 2010 and the first report was published in 2012 with a small collection of 206 plots originally stored in a Microsoft-Access database (Alvarez et al. 2012b). Since then, this database has been affiliated with research activities at the University of Bonn, Germany, in collaboration with diverse academic and research institutions in Eastern Africa.
In 2015, and in the context of a collaborative activity between the SWEA-Project and the ETH-Zürich, Switzerland, SWEA-Dataveg migrated to the software Turboveg (Hennekens and Schaminée 2001) and the first trials for data exchange and processing using R-images and R-scripts were carried out. At that time, export of Turboveg to R was completed using the package "vegdata" (Jansen and Dengler 2010).
After the first releases of the packages "taxlist" and "vegtable" at CRAN in 2017 (see Alvarez and Luebert 2018), the database migrated again, this time to PostgreSQL including the PostGIS extension for handling the location of plots in a Geographical Information System (GIS). During this development, SWEA-Database became larger and more complex, and partially interlinked with the database "sudamerica" (former CL-Dataveg, GIVD SA-CL-001; Alvarez et al. 2012c).
The current version of SWEA-Dataveg is stored in a PostgreSQL database, including the PostGIS extension for geo-referenced information. Plot observations are organ- Scope: Relevés in small wetlands of Kenya and Tanzania collected during the sampling activities of the SWEA project, including semi-natural vegetation (non-used or light used fields), fallows, grasslands and weed communities in crops.
Currently the GlobE wetlands project is continuing data collection with a similar scope as SWEA.
Additional information from other projects and published relevés from East Africa are considered. Though this database is currently not freely available, a delivery for free use after end of GlobE wetlands project is considered. Abstract: SWEA (agricultural use and vulnerability of small wetlands in East Africa) is a multidisciplinary project whose task is to evaluate the effects of land use on the ecological and socio-economical functions of small wetlands in Kenya and Tanzania. In order to allow the availability of the collected data for further studies we stored them into SWEA-Dataveg, a database performed in Microsoft Access (mdb-format). Because this project is dealing not only with vegetation science but also with geography, soil science, hydrology and socio-economy, the database also contains information related with these research fields. Additionally some functional traits of the plant species occurring in the relevés are included in the species list. The sampling areas are concentrated in four localities, two of them in Kenya (Karatina and Rumuruti) and two in Tanzania (Malinda and Lukozi). The vegetation ecology group is dealing in the project with the classification of the vegetation according with the species composition, the correlation of plant communities with environmental factors and land uses, and the survey of potential indicator species for the detriment on the resilience of wetlands. Once finished the storage we are considering an adaptation of SWEA-Dataveg into a Turboveg-format as well as its extension to further projects (e.g. SWEA phase II) and relevés collected from publications.
ized in a table called "header" and linked to several tables analogous to the popup tables of Turboveg (Hennekens and Schaminée 2001). A taxonomic list is also integrated into this database, following the structure used by the R-package "taxlist" (Alvarez and Luebert 2018). Data export is preferentially designed in SQL language and assigned to a "vegtable" object in R (see https://github.com/kamapu/ vegtable). Further process and assessment can be done either in R or exporting to any spreadsheet application for analysis. Additionally, export to the software Juice (Tichý 2002) is carried out by a function called "write_juice()". All plots included in the database are geo-referenced. A logical variable called "validation_coordinates" indicates whether these coordinates were provided by the authors as coordinate values or in a detailed map ("true"), or if they are inferred from the description of locality ("false"). Observations have been undertaken in 12 countries with 2,804 plots (51%) sampled in Kenya, 986 (18%) in the Democratic Republic of the Congo, 467 (8%) in Ethiopia, and 425 (8%) in Tanzania. The rest of the plots were collected in Uganda, Togo, Rwanda, South Africa, Burundi, Congo-Brazzaville, Benin, and Zambia (see Figure 1). SWEA-Dataveg attempts to collect as much of the information originally published with plot observations as possible. Besides information on plot size, recording dates and locations (coordinates and descriptions of localities), additional data on slope inclination, exposition, elevation, total vegetation cover, soil physical and chemical properties and remarks, if provided by the sources, are digitized and stored. From all observations, 79% are stored with a sampling date, 64% with coordinates, 58% with information on plot size, and 21% with information on soil physical or chemical properties (Figure 2). Furthermore, original pages and table number as well as assignment to a specific plant community is also documented.
The associated taxonomic list is supported by five sources referred to as taxon views (see Alvarez and Luebert 2018). This module contains information on taxonomic ranks, parent-child relationships (e.g. indication of the parent genus for a species) and taxon attributes (e.g. life forms, chorology and functional traits). The later information is usually collected from secondary references, including on-line databases, and complements specific project objectives.

Additional features
All data sources are supported by a private soft copy of the relevant published article to enable cross-validation of fidelity of data stored in the database. Digitization procedures strive to resemble the data published in the original source.
Projects attempting to derive critical assessments of classifications in the context of the Braun-Blanquet approach (e.g. Alvarez 2017) are also catered for with a collection of syntaxonomic nomenclatures and Cocktail algorithms stored as "expert systems" (see Landucci et al. 2015).
Besides all of these features, the development of the R-packages "taxlist" and "vegtable" (Alvarez and Luebert 2018) are strongly dependent on the assessment of data contained in SWEA-Dataveg and are used as the main mechanisms for data sharing and publication. The implementation of R-scripts in the assessment of data assure the repeatability of statistics while the current efforts to integrate r-markdown in some functions enables the possibility of producing automatic updates of summaries such as lists of data per syntaxa and publications or check lists of plant species.
At present, data is accessible only after special agreements with the custodian. While data stored from ongoing projects are highly restricted at least during the lifespan of the respective projects, we expect to be able to make data freely available from already published works. The preferred format for exchange is an R-Image including a vegtable object (Alvarez and Luebert 2018). Further alternative formats are Juice tables, SQL dump files for freeware relational database systems (e.g. PostgreSQL, MySQL, LibreOffice Base), and spreadsheets in xlsx, odt and csv formats. In all of these cases, the content of the requested files requires correspondence with the custodian.

Resulting publications
From its origins, SWEA-Dataveg focused on a preliminary classification of wetland vegetation in East Africa (Alvarez et al. 2012a). This work was followed by a classification of aquatic and semi-aquatic vegetation using observations collected in 2012 in Kenya and Tanzania and addressing the Braun-Blanquet approach (Alvarez 2017).
In the specific case of Kenya, a model describing plant biodiversity and spatial conservation prioritization was performed for the Kenyan subset and included a pool of bioclimatic, macroecological and economic factors as explanatory variables (Scherer et al. 2017a, b). This work inferred locations in the country that are most suitable for the expansion of protected areas in order to meet national targets for biodiversity conservation and estimated the required funding to achieve this.
SWEA-Dataveg also supported the design of ecological assessment and monitoring methods, such as an adaptation of the WET-Health approach by Beuel et al. (2016), and the use of physiognomic properties of the vegetation for the estimation of the biological integrity completed by Behn et al. (2018). In both works, the information included in the database was used for the calibration of regression models and the evaluation of outcomes.
Ongoing projects are dealing with distribution models of invasive species in Eastern Africa, in particular on Prosopis juliflora (Sw.) DC. (Alvarez et al. 2019) and Parthenium hysterophorus L. (no publications to date).
In addition to inclusion in the Global Index of Vegetation-Plot Databases (Dengler et al. 2011;Alvarez et al. 2012b), this database also contributed to the sPlot initiative (Bruelheide et al. 2019).

The way forward
The implementation of a multiple-taxon views approach, for instance considering discrepancies among different projects involved in the African Plant Database (https:// www.ville-ge.ch/musinfo/bd/cjb/africa/recherche.php) and some regional floras (e.g. Flora of Tropical East Africa, Beentje et al. 1952Beentje et al. -2012Flora of Ethiopia andErithrea, Hedberg et al. 1989-2009), will make this database more versatile. This will also allow it to expand areas of coverage and to integrate other databases under the same database model, such as the database "sudamerica" (Alvarez et al. 2012c).
We also seek to integrate an electronic document library, which is at present housed in a separated database formatted as a BibTeX file and linked to respective data sources as well as taxonomic and syntaxonomic authorities.