Plant communities, synusiae and the arithmetic of a sustainable classification

We propose an equation to evaluate the efficiency of a classification as a function of the effort required and the population size of data collectors. The formula postulates a “classification efficiency coefficient”, which relates not only to the complexity of the object to be classified, but also to the data availability and representativeness. When applied to the classification of phytocoenoses, the equation suggests that a classification system based on vascular plants offers the best compromise between sampling effort, resolution power and data availability. We discuss the possibility of basing a vegetation classification on plot records for all macroscopic photoautotrophic organisms co-occurring in the vertical projection of a given ground area, as recently suggested by some authors. We argue that the inclusion of cryptogams in the description of phytocoenoses dominated by vascular plants should rely on a synusial approach, conceived as complementary to the traditional Braun-Blanquet approach. Syntaxonomic reference: Mucina et al (2016).


Introduction
Classification is one of the most fundamental and characteristic activities of the human mind and underlies all forms of science (Crowson 1970). Two broad areas of the philosophy of science impinge upon classification. The first, ontology, is concerned with the recognition, conceptualization and formalization of the objects to be classified. The second, epistemology, is concerned with how we acquire knowledge and justify hypotheses about these things and their relationships (Wiley and Lieberman 2011).
The classification of the biotic communities (or biocoenoses) is based upon the observation that the distribution of living organisms in their environment is not en-tirely subjected to chance. In most terrestrial ecosystems, vascular plants are the most visible and accessible part of biocoenoses that include, in addition to the primary producers (photoautotrophic organisms of any kind), also consumers, detritivores, decomposers and microbial symbiotic communities, of which we have become more aware in recent times as a result of sequencing techniques (Vandenkoornhuyse et al. 2015;Shi et al. 2016).
In the case of vascular plants, the dispersal of seeds may be somewhat random, but germination and seedling establishment are regulated by environmental constraints and the plants come to organize themselves into communities in which relationships of coexistence regulate the species distribution in space (patterns and frequency), in time (phenology and turnover), and in many other

FORUM PAPER
International Association for Vegetation Science (IAVS) aspects of plant life (Braun-Blanquet 1964). Further relationships are established with the soil microbiota and the local fauna. The repeated coincidence of a similar set of plant species can build a regular distribution pattern, which enables a formal classification of the vegetation (Whittaker 1978;De Cáceres et al. 2018). Based on the assumption that non-vascular plants can be important structural elements of vegetation, Berg et al. (2020) suggest that a consistent vegetation classification system should be based as much as possible on plot records for all macroscopic photoautotrophic organisms co-occurring in the vertical projection of a given ground area.
In principle, this proposal is based on a reasonable assumption. However, the sampling effort of 'all-inclusive' phytocoenoses is significantly higher than that of recording vascular plants only and, for the sake of a classification, Berg et al. (2020) accept that non-vascular plants could be omitted in vegetation types in which they play a subordinate role (e.g. mesic grasslands, ruderal communities) without a significant impact on the classification results. Ultimately, Berg et al. (2020) recommend recording at least terricolous cryptogam layers, particularly in the case of plant communities in which they constitute a sizeable part of the local biodiversity or biomass.
The aim of this paper is to propose a mathematical formulation for classification efficiency and to discuss some practical and epistemological consequences when applying the recommendations of Berg et al. (2020).

The arithmetic of a sustainable classification
Before reasoning on the methodological consequences of recording the terricolous cryptogam layer when sampling vegetation plots, let's try to pose the question (i.e. the vegetation classification) in stringent, arithmetical terms. Let's consider the formula: in which: C(n,v) indicates the complexity of the object to be classified, and P(n,v) indicates the detected fraction of that complexity obtained through data sampling. In operational terms, C(n,v) is a function space whose main vector quantities are: (1) the whole number of species n occurring in a given area, corresponding to the local species pool targeted by the vegetation classification (representing always and in any case a subset of the local biota) and (2) the number of vegetation units v that can be distinguished in a given area and in a given time interval. Note that C(n,v) is generically defined as function space, i.e. a set of functions between n and v.
The generical definition of C(n,v) is rather vague; however, if we assume that the vegetation v of a given area consists of discrete units (community types) formed by different assemblages of the n target species occurring in the same area (species pool), then we can associate to each species S k (with k = 1, 2, …, n) a simplified version of the phi coefficient (Tichý and Chytrý 2006) C k , defined as: in which: v k is the number of vegetation units in which S k is present in a given area, and v is the total number of vegetation units in the same area. It follows that 0 ≤ C k < 1. In particular, C k = 0 if v k = v, i.e., if the species S k occurs in all the vegetation units; therefore, its contribution in differentiating the vegetation units is null. The value C k = 1 is excluded because we assume that the species S k is part of the species pool of the given area and, as such, it has to be present in at least one of the vegetation units of that area.
Given the definition of C k we can define the complexity coefficient C(n,v) as the average of all C k values: The variation of C(n,v) will also be 0 ≤ C(n,v) < 1 . In particular, it would be C(n,v) = 0 in the case of an entirely homogeneous vegetation in the given area. Therefore, the complexity coefficient C(n,v) is conceptually similar to a measure of beta diversity.
If we define the vegetation as a sum of vegetation units or community types, ideally the number of plots should be large enough to record each vegetation unit at least once (purposive sampling design). P(n,v) indicates the detected fraction of complexity, defined by the quantity of data available, i.e., how much the number of sampled plots and species recorded are functional to the classification. The function P(n,v) represents the 'added value' of a given sampling effort (in other words, the 'added value' produced by the classification in question). Again, if we assume that the vegetation v of a given area consists of discrete units (community types) formed by different assemblages of the n species belonging to the local species pool, we can write for the coefficient P(n,v) an heuristic expression containing: a) the ratio (n eff /n), in which n eff is the number of species recorded during the sampling effort; b) the ratio (v eff /v), in which v eff is the number of vegetation units identified.
We can impose the condition that, for n eff = n and v eff = v, the ratio (1) is equal to 1. If so, we can write: in which r n and r v are weighting factors subject to the following constraint: r n + r v = 1. The weighting factors r n and r v can be used to weight differently the species and the vegetation units identified by the sampling effort. If the vegeta-tion scientists involved are highly skilled in identifying any species belonging to the species pool on which they pretend to base their classification, one can simply set r n = r v = 1/2. Should it be decided to base the vegetation classification on all macroscopic photoautotrophic organisms co-occurring in the vertical projection of a given ground area, the condition r n = r v will occur very rarely. Therefore, by the number of species n eff and the number of vegetation units v eff identified, P(n,v) depends on the number of plots of a given size sampled in a given time interval in the given area. As for i, it measures the impact (effectiveness) of a classification effort. This coefficient is directly proportional to the value of P(n,v) and inversely proportional to that of C(n,v). In practice, it indicates whether the classification in question 'works' (given the aims and protocols) at the price of a greater or lesser sampling effort. More precisely, it is a coefficient of effectiveness of the plots sampled in a given area. In summary, i can be defined as the 'classification efficiency coefficient' .
In particular, i will be equal to 1 when P(n,v) = C(n,v); which happens if n eff = n and v eff = v, that is when the information on the species pool and the vegetation units of a given area obtained by sampling is complete. Also, i will tend to 0 for n eff << n and for v eff << v; in this case, sampling is essentially ineffective. It should be noted that i will be equal to 0 also if C(n,v) = 0, corresponding to the limit case where the vegetation of the given area is entirely homogeneous and corresponds to a single vegetation unit.
The 'classification efficiency coefficient' is highly influenced by the 'cost' of each single plot, provided that the identification effort of the species recorded during the survey can be different and not necessarily homogeneous with respect to the general purposes of any classification approach.
Just as any classification effort, materialiter acceptus, can be associated with a certain level of efficiency in the identification effort of the descriptors for the object to be classified, every single vegetation plot can be associated with a cost, corresponding to a fraction of the utility produced by the classification as a whole (precisely, the fraction that manages to classify that plot). Additionally, we can write that: in which F is the population of vegetation scientists and r is the average number of plot records produced per capita, so that: Therefore, if we disregard the theoretical possibilities offered by machine-based approaches, such as remote-sensing, spectral fingerprinting, bulk collection by robots and subsequent metabarcoding, the efficiency (and sustainability) of a vegetation classification is inversely proportional to the complexity of the classification target and directly proportional to the size of the population of vegetation scientists multiplied by the average number of plot records produced per capita.
The above-written equations are valid from the global to the local scale, with the only limitation given by the availability of (skilled) vegetation scientists and of species identification tools for the target territory. These two aspects, of course, are of particular importance due to the well-known enormous regional variance on data availability and resource expenditures.

The phytosociological classification approach
The phytosociological approach to vegetation classification is based on operational units which have a very practical goal, that is to give a reasonably precise name and conceptualization to plant communities which appear, to some extent, discrete to the eyes of phytosociologists (Dengler 2003;Biondi 2011;Pott 2011).
In principle, the traditional Braun-Blanquet system is based on all photoautotrophic taxa. However, a different weight is attributed to the vegetation layers in the classification and, apart from few exceptions, the bulk of data underlying the phytosociological system focuses on vascular plant species only. Between the two possible extremes, i.e. a taxon-free, physiognomic vegetation classification on the one hand and an omnicomprehensive vegetation classification (i.e., based on all photoautotrophic taxa) on the other hand, the current phytosociological classification system offers perhaps the best compromise between sampling effort, resolution power and data availability.
In the previous section, we introduced the coefficient i as a generic measure of the effectiveness of a classification effort. However, it must be noted that the variables considered do not fully capture the effectiveness and sustainability of any vegetation classification. There are other, somewhat 'finer' variables that cannot be treated with the same arithmetic simplicity. Nevertheless, it should be stressed that i does not only depend on the complexity of the object to be classified, but also on the data availability and representativeness. In fact, the great attention currently paid by vegetation scientists to 'big data' -both in the current debate and in comprehensive synthesis studies -indicates that no proposal on new data acquisition methods can afford to ignore the 'big' , represented by previously recorded data.
Vegetation scientists are relatively few, and those dealing with phytosociological vegetation classification are even fewer, and many of these are familiar with vascular plants only. Not only is the number of phytosociologists progressively decreasing, but also the time dedicated to field data collection (Chytrý et al. 2011). Thus, paradoxically, the current phase of comprehensive regional synthesis, semi-supervised validation of syntaxa and expert systems for vegetation classification is largely based on data collected in the last century by phytosociologists who followed the methods and rules of quite different schools (Chiarucci 2007; Guarino et al. 2018), who visually estimated not only the plant cover, but also the plot area, and who applied very different sampling criteria, not only with respect to cryptogams, but also to the vascular plants species. For example, there was a common practice of ignoring therophytes growing in the interstitial space of perennial grasses and forbs while sampling the Mediterranean perennial dry grasslands ascribed to the class Lygeo sparti-Stipetea tenacissimae (Rivas-Martínez 1978). Although this might be considered unacceptable by most Central European phytosociologists, it probably has not had a significant impact on the classification itself (Marcenò et al. 2019).
Given the variables involved in our arithmetical definition of a "sustainable" vegetation classification, we will now turn our attention to some practical and epistemological consequences of basing the phytosociological system "as much as possible" on holocoenoses (i.e. on plot records of all macroscopic photoautotrophic organisms co-occurring in the vertical projection of a given ground area).

Completeness versus sampling effort
If we accept the eminently practical purpose of the phytosociological vegetation classification, we should ask ourselves what advantages or disadvantages a more complete, but more time-demanding, sampling approach would have.
As we have seen, the classification itself should be evaluated basing on its efficiency (corresponding to what we defined as i), but also on the skills and size of the population of vegetation scientists who collect the data and produce the classification itself.
If, for the sake of completeness in the data collection and classification, one wanted to extend the investigation to the whole autotrophic component of the local biota, the classification would be based on species that are biologically, physiologically, metabolically and dimensionally different from each other. This raises many questions about the optimal sampling period, the extra-time required for plot sampling, and the availability of data collectors skilled enough to record all macroscopic photo-autotrophic organisms occurring in the plot.
The recently revised version of the International Code of Phytosociological Nomenclature (henceforth: ICPN; Theurillat et al. 2021) remains rather vague on how complete a floristic inventory of species should be to define a syntaxon. Indeed, there is no golden rule to assess whether the recorded components of a phytocoenosis are enough for a phytosociological classification or not. We believe that in all the plant communities in which cryptogams constitute a major part of the biodiversity and/or biomass, their role has been already acknowledged by the traditional phytosociological classification. This is the case of Oxycocco-Sphagnetea, Montio-Cardaminetea, Loiseleurio-Vaccinietea, Adiantetea, Polypodietea, and other vegetation classes. However, even in these cases, the added value of recording the complete cryptogamic layer for the classification of vegetation plots into higher rank syntaxa remains largely undemonstrated. For example: do we really need to identify every single moss species to classify a plot from a dripping stonewall covered by Adiantum capillus-veneris into the class Adiantetea?
In any classification system, it is a clear advantage to maintain as much as possible the nomenclatural stability and the conceptual delimitation of the classified objects. Should the praxis of recording all macroscopic photoautotrophic organisms in vegetation plots become a stringent rule of the phytosociological classification, there is a serious risk of rejecting many syntaxa as nomina dubia because "only vascular plants have been recorded, but also the species of the moss layer would be needed for proper classification" (Berg et al. 2020, Appendix S1). This could be the case for Festuco-Brometea, Elyno-Seslerietea, Sedo-Scleranthetea, Koelerio-Corynephoretea canescentis and other well-established syntaxa characterized by a rich cryptogamic layer.
The decision of whether the species of the moss (or lichen) layer are needed for a "proper" classification is further complicated because some vegetation units change their "properties" depending on the substrate they are found on. For instance, let's consider the vegetation ascribed to the class Polypodietea. Patches of this bryo-pteridophytic vegetation can colonize a boulder in the forest understorey, the bark of ancient trees in the same forest, but also hundreds of square meters of vertical cliffs in fresh and shady gorges and even man-made stonewalls. Should it be considered a synusia (or merocoenosis) when in the forest and a holocoenosis when it occurs on vertical cliffs?
The synusial solution Berg et al. (2020) state that recording epiphytes in forest relevés is "not needed for the majority of purposes, including classification". However, the example of Polypodietea is directly related to the level of organization targeted by the classification and thus to the perception of which 'plant communities' should appear discrete to the eyes of phytosociologists. In the synusial approach to vegetation classification (Barkman 1973;Gillet and Julve 2018), the concrete plant community is perceived and recorded at the 'organismic scale' , i.e. the scale of the plant organisms, which depends on their size. Indeed, the organismic scale of epilithic, epixylic, saproxylic and even epigeic cryptogam synusiae differs fundamentally from that of vascular plants. Moreover, a phytocoenosis can be described as an assemblage of structural and functional components, based on vegetation layers, patch mosaics and phenological phases (Gillet et al. 1991).
Synusial phytosociologists argue that, from an ecological point of view, the sampling grain (observational scale) for floristic plot records should be logically related to the organismic scale, which is always the case for plant synusiae, while for phytocoenoses the choice of the plot size is usually based on the largest plants (e.g., trees in forests).
Breaking with current practice in both Braun-Blanquetian and synusial phytosociology, Berg et al. (2020) recommend that in a plot record of a phytocoenosis "all species should be considered that occur in the vertical projection above a certain ground area", which cannot be less than 1 m² because "any plant assemblage recorded on less than 1 m² would automatically be considered as a synusia". Using the same plot size for trees and epiphytic mosses in a forest will appear extremely challenging and unworkable to any bryosociologist. Sampling epiphytic vegetation is incompatible with the delimitation of a large ground area since species assemblages are different and fragmented among tree species, height above ground level, diameter and inclination of trunks and branches, etc. It requires working on small bark surfaces to record each synusia (Berg et al. 2016) and often fragmenting the relevé into several non-contiguous subplots (Gillet and Julve 2018).
Hoping to record all plant species, including cryptogams, in a relatively large area is, in most cases, a pious wish. As a matter of fact, a good knowledge of cryptogams is quite rare among vegetation scientists. As a result, only some well-known and easily identifiable species will be recorded. Berg et al. (2020) argue with reason that it is inconsistent to apply the same classification and nomenclature concept (syntaxon) to phytocoenoses and synusiae. They recommend including all vascular plants and macroscopic cryptogam species, organized in layers and strata, in the description and the comparison of phytocoenoses for classification purposes (implicitly in the framework of the Braun-Blanquet approach), and to develop another independent classification system for synusiae. This is reminiscent of an old debate among phytosociologists, which divided "phytocoenologists" and "synusiologists" at the beginning of the last century. At the end of the 6 th International Botanical Congress in Amsterdam in 1935(reported in Cain 1936Du Rietz 1936;Pavillard 1936;Lippmaa 1939), the leaders of the different schools in phytosociology agreed to propose a joint resolution stating that phytocoenoses must be classified as 'associations' (Braun-Blanquet's approach) and synusiae as 'unions' (Lippmaa's approach). After 80 years, it is interesting to see this old debate reappear and that a similar solution has been proposed (Berg et al. 2020). In the meantime, however, phytosociologists of the Braun-Blanquet school have refined their concept of the syntaxon so that more and more quasi-synusial vegetation units have been described, as shown by the recent European synthesis (Mucina et al. 2016), including many syntaxa made up only of bryophytes or lichens. This is precisely the ideological drift criticized by Berg et al. (2020), advocating for a strict separation of phytocoenotic and synusial classifications, albeit both are based on species composition. However, there are many examples of monosynusial phytocoenoses (or at least recorded as such), e.g. in mesic grasslands, which would imply classifying them in redundant syntaxa and 'merotaxa' (e.g., Cynosurion and Cynosurulion). This was one of the arguments put forward by the integrated synusial approach for a unified synusial conception of the syntaxon instead of two independent classification systems, while avoiding the inconsistency of mixing phytocoenotic and synusial syntaxa in the same system, which is a real problem from both a basic and applied point of view (Gillet and Julve 2018).

Epistemological considerations
The ancient vision of science, dating back to Bacon, is based on the idea that external events or 'facts' can be observed in a neutral way and classified to build scientific theories by induction and deduction. This vision was definitively superseded by Immanuel Kant and his successors. According to Karl Popper (1959), "The empirical basis of objective science has thus nothing 'absolute' about it. Science does not rest upon solid bedrock. The bold structure of its theories rises, as it were, above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or 'given' base; and if we stop driving the piles deeper, it is not because we have reached firm ground. We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being. " (The Logic of Scientific Discovery, 5: 30).
Therefore, there cannot be an objective classification of reality, but each researcher interprets reality starting from ideas, categories and mental schemes that are tested and possibly corrected (the famous Popper's "falsification") as errors are detected and new tools become available (Tüxen and Kawamura 1975). Collecting species occurrence data on macroscopic cryptogams could contribute, to some extent, useful information for the classification of vegetation dominated by vascular plants, but probably would not be enough to falsify the 'classical' phytosociological approach to vegetation classification, nor to justify the unconditional adoption of the concept of holocoenosis (sensu Berg et al. 2020). To use Popper's metaphor, it may be that a classification based on vascular plants is firm enough to carry the phytosociological structure and there's no need to drive the piles deeper.
One could argue that Popper would have said that phytosociology has no piles at all, that it is not falsifiable and therefore not a good science. The methods in phytosociology are not suitable to "Erklärung", only to "Verstehen". This is because whatever classification you make, it will never be a good representation of nature, only of the mind of the researcher, directed by the aims and goals of the classification. The paradigm of phytosociologists is that of the "Spurensucher" (i.e. trace-tracker); Ginzburg (1988) used the term "Indizienwissenschaft" for this kind of sciences.
In the second half of the 20 th century, the debate on philosophical thought was occupied by the rehabilitation of Aristotle's "practical philosophy" (to which the "Indizienwissenschaften" belong), pursued above all by the German and Anglo-American schools of Hans-Georg Gadamer, Hannah Arendt and Bernard Williams. Aristotle was the first who outlined, as object of a specific form of knowledge, exactly that praxis which classifications of any kind are primarily concerned with. A fundamental criterion for marking the domain of praxis is that "the principle of actions is in the agent" (Metaphysics VI, section 1025b, transl. by Hugh Tredennick). Any classification belongs to the domain of practical knowledge and, unlike theoretical knowledge, it is not as useful for satisfying theoretical speculations (in an Aristothelic sense). Instead, it is used to satisfy eidetic and poietic needs. In other words: classifications should have a practical goal and it must have a utility for the "agents", i.e. those who use it. As pointed out by Berg et al. (2020), a phytosociological classification is used "for better communication, especially in applied fields like forestry, landscape planning, vegetation mapping, or nature conservation". We cannot pretend that the "agents" in all these applied fields will be familiar with the identification of non-vascular plants.

Conclusions
For the sole purpose of a taxon-based vegetation classification, the most important thing is to collect enough data on species co-occurrences. This means that collecting many species co-occurrences, even if not particularly complete or accurate, is more useful than collecting few extremely accurate and comprehensive ones.
Science will produce useful and essential knowledge only when it classifies objects and makes predictions based on statistically significant datasets, analysed according to adequate protocols. Field data collection should exercise the art of the feasible more than the art of the possible: the adoption of a sampling protocol aimed at recording all macroscopic photoautotrophic organisms co-occurring in the plot would require time and a whole series of tests to essay its pros and cons against a sampling approach chasing higher plot numbers more than plot completeness.
However, recording co-occurrences of all macroscopic photoautotrophic organisms is not only a matter of effort but also of strategy and conventional rules: if enough projects would follow the "comprehensive" sampling approach, sooner or later we would get enough data to better assess the added value of including cryptogams in the classification of phytocoenoses.
In the field, it is a good idea to make an effort to collect the best possible data, given the time, logistical, and resource constrains. The recording of non-vascular taxa can represent important added value in studies on the drivers of α-diversity, as well as species-area curves used to study fine-grain β-diversity (Löbel et al. 2016;Biurrun et al. 2021;Dembicz et al. 2021). However, it is unlikely that the recording of non-vascular taxa will significantly impact the general paradigms of the current phytosociological classification.

Author contributions
R.G. conceived the idea and led the writing of this paper, with substantial contributions from both other authors.