Forum Paper |
Corresponding author: Jürgen Dengler ( dr.juergen.dengler@gmail.com ) Academic editor: John Hunter
© 2024 Jürgen Dengler.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Dengler J (2024) Determinants of citation impact. Vegetation Classification and Survey 5: 169-177. https://doi.org/10.3897/VCS.126956
|
This article aims to quantitatively assess how different formal aspects – beyond the relevance and quality of a study – influence how often a scientific paper is cited. As a case study, I retrieved all publications co-authored by myself from the Scopus database, of which 174 could be used for regression modelling. The citation impact was quantified as Field-Weighted Citation Impact (FWCI), which is the citation number normalised by year, subject area and article type. I examined 13 easily accessible numeric and binary predictor variables, including the Source Normalized Impact per Paper (SNIP), open access, special feature, number of authors, length of article and title, as well as formal aspects of the title. In the minimal adequate model, these formal aspects explained 50.2% of the variance in FWCI, with the SNIP alone explaining only 26.8%. Other strong positive predictors were title brevity, article length, special feature and the use of a colon in the title. By contrast, open access and the formulation of titles as factual statements did not have a significant effect. For authors who wish to make their articles more impactful, the main recommendation is to shorten the title and to disregard using factual statements that make the title longer.
Abbreviations: FWCI = Field-weighted Citation Impact; JIF = Journal Impact Factor; OA = open access; SNIP = Source Normalized Impact per Paper: VCS = Vegetation Classification and Survey.
article impact, article title, bibliometrics, citation rate, Field-Weighted Citation Impact (FWCI), normalised citation rate, open access, research assessment, Scopus database, special feature, vegetation ecology, Web of Science
Authors of scientific papers normally want to achieve impact with their publications, and likewise editors of scientific journals want the published articles to be as impactful as possible. Therefore, the big question is “what makes a paper successful?” Admittedly, the scientific impact of a paper depends on the content, such as the relevance of the topic, state-of-the art techniques in the analyses and well-founded conclusions. Secondly, one would think that the writing style and the appeal of the figures play a role. Both are doubtlessly true, and it is hard to give generic advice on the first point while the second is nicely addressed in various textbooks on scientific writing (
However, there is a third group of factors that should not be underestimated. These are formal aspects, such as the choice of the journal and of the language, the style of the title and the length of the article. Authors and editors alike invest considerable efforts here. However, there is a lack of empirical studies that test which measures might be effective and to which degree they contribute to the success of a paper. My long-standing impression as co-author and editor is that this field is dominated by either ignorance or strong beliefs, but hardly by empirical facts. To fill this gap, I conducted a quantitative study on how different “formal aspects” influence the citation impact of articles.
For this study I used all papers (co-)authored by me and available in the Scopus database (https://www.scopus.com) on 1 May 2024. This process allowed me to discuss individual papers without exposing other researchers in an undue manner. Moreover, using the papers of a single author reduces variation resulting from different skills of different authors and from different subject fields in which they work. Of course, the list of co-authors and thus their skills as well as the detailed subject fields still vary, but the latter appear to represent a typical set for vegetation ecologists who publish in the journals of the International Association for Vegetation Science (IAVS).
Data extraction yielded 189 entries, of which four were duplicates, six were from 2024 (i.e. with very limited chance to garner citations, and indeed four were without citations so far) and two were from pre-2003 resulting from unsystematic databasing at that time (one conference abstract, one book review). These 12 entries were excluded, leaving 177 observations to be used in the modelling (Suppl. material
I used the Field-Weighted Citation Impact (FWCI) as of 1 May 2024, provided by the Scopus database as the measure of scientific impact (dependent variable). FWCI normalises the citations of each paper in the year of the publication and the three following years compared to all papers of a certain year, subject area and article type (e.g. “Article”, “Review”). Thus, a FWCI of 1 means that an article was cited as often as the average of all articles in the group; a FWCI of 2 means that it received twice as many citations etc. Unlike the raw citation rates, which are strongly dependent on the time elapsed since the publication, FWCI values are directly comparable between articles published in different years, between reviews and research articles or between different disciplines. Another advantage of the FWCI is that Scopus also provides an analogous measure at the journal level, called Source Normalized Impact per Paper (SNIP), where SNIP is essentially the average of the FWCI values of all articles in the respective period. For one article from 2022 and two articles from 2023 which therefore possessed a FWCI of 0, I inserted half of the minimum of all other FWCI values of that year instead (0.05 and 0.30, respectively) to allow modelling (see below). For readers who are more familiar with Journal Impact Factors (JIFs) from the Web of Science, I calculated the relationship of the two metrics for the year 2022 for those 46 journals that were also included in the Web of Science with linear regression after log-transformation of both variables to meet the assumptions of linear models: log10(JIF.2022) = 0.42 + 1.32 log10(SNIP.2022). This means that a SNIP of 1 corresponds to a JIF of 2.6 and a SNIP of 2 to a JIF of 6.6.
As predictor variables, I used formal and quantitative features of the journal, of the article, its titles and authors, where there is some plausible relationship to citation impact and that could be derived from the data provided by Scopus, or I could easily extract this from pdf’s (Table
All statistical modelling was done in R version 4.2.2 (
Variables used in the regression modelling of the 177 articles and some further citation metrics, their value distribution and their handling in the modelling.
Variable | Mean | Min | Max | Modelling |
---|---|---|---|---|
Dependent variable | ||||
FWCI 2024.5 | 2.94 | 0.05 | 32.05 | log10 |
Independent variables (numeric) | ||||
SNIP 2022 | 1.15 | 0.03 | 11.59 | log10 |
Year | 2017 | 2003 | 2023 | |
Pages | 16.50 | 1 | 262 | log10 |
Authors | 25.72 | 1 | 601 | log10 |
Title characters | 94.49 | 14 | 209 | excluded because of high correlation with Title words |
Title words | 12.62 | 1 | 31 | |
Independent variables (binary) | ||||
Book chapter | Yes = | 3 | modelled separately | |
Open access | Yes = | 101 | ||
Special feature | Yes = | 63 | ||
English | Yes = | 170 | ||
Title with statement* | Yes = | 14 | ||
Title with word play** | Yes = | 6 | ||
Title with “?” | Yes = | 4 | ||
Title with “:” | Yes = | 47 | ||
Title with dash | Yes = | 27 | ||
Further citation metrics (not used in the modelling) | ||||
Citations | 49.02 | 0 | 1025 | |
delta (FWCI vs. SNIP) | 1.68 | -7.89 | 30.66 | |
log-ratio (FWCI vs. SNIP) | 0.18 | -1.26 | 1.46 |
Minimal adequate model to explain the log10-transformed Field-Weighted Citation Impact (FWCI). The estimates for the predictors in the multiple and simple linear regressions as well as the associated R2adj. values are given. n.s. = non-significant.
Multiple regression | Simple regressions | |||||
---|---|---|---|---|---|---|
Variable | Estimate | t value | p-value | R 2 adj. | Estimate | R 2 adj. |
(Intercept) | 0.329 | 2.009 | 0.046 | 0.502 | ||
log10(SNIP 2022) | 0.780 | 7.001 | <0.001 | 0.800 | 0.268 | |
Special feature | 0.158 | 2.444 | 0.016 | n.s. | ||
Year - 2003 | -0.030 | -4.620 | <0.001 | n.s. | ||
log10(Pages) | 0.324 | 2.819 | 0.005 | 0.307 | 0.020 | |
log10(Authors) | 0.282 | 4.067 | <0.001 | 0.416 | 0.172 | |
Title words | -0.038 | -5.692 | <0.001 | -0.038 | 0.105 | |
Title with “:” | 0.169 | 2.566 | 0.011 | n.s. |
The log-transformed FWCI was significantly higher in book chapters than in journal articles (p = 0.017; R2adj. = 0.026). The estimate (0.748) suggests that on average my book chapters are cited 5.6 times more often than my journal articles. In the multiple regression for journal articles only, among the 12 predictor variables in the global model, seven remained as significant terms in the minimal adequate model (Table
The most influential variable (i.e. the one with the highest absolute t-value) in the multiple regression was the log-transformed SNIP. The estimate suggests that with each doubling of the SNIP, the FWCI increases on average by 43%. However, in a simple regression SNIP explained only 26.8% of the overall variance in FWCI. Conversely, the minimal adequate model leaving out SNIP explained 31.5% of the variance (not shown).
The number of title words had the second-strongest influence in the minimal adequate model. The estimate suggests that each additional word decreases the FWCI by 8.4%, and likewise each word less increases it by 9.1%. Also, the log-transformed number of authors was highly significant in the minimal adequate model and was the second-most influential variable among the bivariate models (17.2% explained variance in FWCI). According to the estimate in the minimal adequate model, each doubling of the author numbers would lead to a 13.9% higher FWCI. The year of publication had a highly significant negative impact on the FWCI, with an estimated decrease of FWCI per year by 6.7%. By contrast, in the simple regression model year of publication was not significant. The log-transformed number of pages was significant, with an estimated increase of the FWCI for each doubling of the page number by 16.1%. The presence of a colon (“:”) in the title had a significant positive impact on the FWCI (+48%) as had the question whether an article was published in a special feature/special collection (+44%).
By contrast, the variables open access (yes vs. no), language of the article (English vs. German) as well as the use of factual statements, questions, word plays or dashes in the title had no significant influence on the FWCI in the multiple regression model and thus were not included in the minimal adequate model.
Among the tested variables, SNIP was the strongest predictor both in the multiple regression and among the bivariate regressions. It is self-evident that there must be a positive relationship between the FWCI of the articles and the SNIP values of the journals as the latter essentially are the averaged FWCI values of the included articles. That articles in journals with higher SNIP are more cited can be explained by three mechanisms that act together: (1) authors tend to submit their better manuscripts to the better journals; (2) higher-ranked journals likely have the more experienced editors and reviewers who can help more to improve the manuscript than in lower-ranked journals; and (3) publications in higher-ranked journals likely attract more readers as a high SNIP/JIF to many readers suggests high quality. Given all these obvious links, it is somehow astonishing that SNIP explained only a little more than one quarter of the variance in FWCI and thus less than the other formal aspects combined. This is mainly driven by the fact that the citations rates among different articles in the same journal vary dramatically (Figure
Variation of FWCI values of articles in journals represented by at least five articles in the sample. The height of boxplots is proportional to the number of articles included in the sample. Note that the x-axis has a log-scaling. The length of the box-whisker plots indicates that except for Journal of Biogeography, the most-cited article in the sample performs at least 10 times better than the least cited one, while the difference was as big as 185 times in the case of Vegetation Classification and Survey.
The top-5 over- and underperforming papers in the analysed portfolio of 174 journal articles compared to the average citation rates of the respective journals. The ranking was done by absolute differences (delta), while additionally the relative differences are given as ratios and log-ratios. Note that some articles are underperforming relative to the average of the journal in which they were published, but still are overperforming relative to all articles in the subject area and year (i.e. have a FWCI > 1).
Authors | Year | Title | Publication venue | Citations | FWCI 2024.5 | SNIP 2022 | delta (FWCI vs. SNIP) | ratio (FWCI vs. SNIP) | log-ratio (FWCI vs. SNIP) |
---|---|---|---|---|---|---|---|---|---|
Mucina et al. | Vegetation of Europe: hierarchical floristic classification system of vascular plant, bryophyte, lichen, and algal communities | Applied Vegetation Science | 1025 | 32.05 | 1.389 | 30.66 | 23.07 | 1.363 | |
Tichý et al. | Ellenberg-type indicator values for European vascular plant species | Journal of Vegetation Science | 35 | 22.82 | 0.901 | 21.92 | 25.33 | 1.404 | |
Dengler et al. | Ecological Indicator Values for Europe (EIVE) 1.0 | Vegetation Classification and Survey | 22 | 18.45 | 0.647 | 17.80 | 28.52 | 1.455 | |
Bruelheide et al. | Global trait–environment relationships of plant communities | Nature Ecology and Evolution | 394 | 20.30 | 3.989 | 16.31 | 5.09 | 0.707 | |
Wilson et al. | Plant species richness: The world records | Journal of Vegetation Science | 609 | 17.19 | 0.901 | 16.29 | 19.08 | 1.281 | |
[…] | |||||||||
Klotz et al. | Plasticity of plant silicon and nitrogen concentrations in response to water regimes varies across temperate grassland species | Functional Ecology | 1 | 0.26 | 1.645 | -1.39 | 0.16 | -0.801 | |
Laughlin et al. | Rooting depth and xylem vulnerability are independent woody plant traits jointly selected by aridity, seasonality, and water table depth | New Phytologist | 1 | 0.62 | 2.490 | -1.87 | 0.25 | -0.604 | |
Vetter et al. | Invader presence disrupts the stabilizing effect of species richness in plant community recovery after drought | Global Change Biology | 18 | 1.04 | 3.007 | -1.97 | 0.35 | -0.461 | |
Jandt et al. | ReSurveyGermany: Vegetation-plot time-series over the past hundred years in Germany | Scientific Data | 5 | 0.58 | 2.887 | -2.31 | 0.20 | -0.697 | |
Jandt et al. | More losses than gains during one century of plant biodiversity change in Germany | Nature | 27 | 3.70 | 11.591 | -7.89 | 0.32 | -0.496 |
Interestingly, the second-most influential predictor was the title length, with articles being on average much more cited when the title is shorter. It is not directly intuitive why title brevity is so influential. Likely, the main reason is that a short title is normally achieved by getting rid of as many non-necessary words as possible. As people find articles mainly via search engines, the title essentially should be a sequence of probable keywords for which people might search (“search engine optimisation”). The top-ranked journal Nature apparently is fully aware of the importance of short titles as their author guidelines strictly forbid any title longer than 75 characters, including spaces (which typically corresponds to 7 to 11 words).
By contrast, the two other numeric indicators, number of authors and number of pages, had a positive effect on citation rates. The particularly strong effect of the number of authors (third-strongest predictor) can be explained by a set of non-exclusive mechanisms. First, a higher number of authors is typically related to larger datasets that allow more comprehensive analyses. Second, if more authors with their experiences are involved in paper preparation, this will likely lead to a higher manuscript quality. Last, a higher number of authors also means that more people (the authors and their networks) are aware of the paper and thus likely to cite it. It is not so obvious why also a greater length of the paper is beneficial. Most likely it is because a greater length allows incorporation of more different subtopics, meaning that the paper contains relevant information for a wider range of other studies.
Among the different binary article typologies, only book chapters vs. journal articles and special features vs. regular articles had a positive effect, but not so open access or English language. The unexpectedly much higher citation rate of book chapters compared to journal articles can probably be attributed to the narrow selection of books that are currently covered by Scopus. In my case, these are two “encyclopedias” that provide authoritative mini reviews on the current state of knowledge across a wide range of topics and thus are relevant for many studies as background information. If the coverage of books in Scopus was as wide as for journals, this citation advantage probably would disappear. The citation advantage of articles in special features is not a big surprise. Being part of a special feature automatically increases the visibility as there is usually an editorial that highlights the relevance of each included paper, plus often some additional “advertising” activities. Moreover, editors of special features are specialised in its narrow topic and thus might be able to contribute more to the improvement of the submitted manuscripts than normal editors can in journals where they must handle manuscripts of a much wider range of topics. Surprisingly, publishing OA did not bring any benefit in terms of citation rates. Naively, one would imagine that OA increases the visibility of articles and thus the chance of being cited – and previously there have been some studies that showed such a positive effect (
Among the other characteristics of the article titles beyond the length, only the presence of a colon (“:”) had a significant positive effect, while using a dash or a word play or phrasing the title as a question or factual statement had no significant effect – despite many authors seem to believe that it is beneficial to do so. In fact, using questions or statements even has an implicit negative effect on citation rates as reformulating a “conventional” title as question or statement requires additional words, while the number of words has a strong negative effect on citation rates. By contrast, the use of colons and dashes allows conveying the same information in a title but with less words, e.g. “Dry grasslands of Southern Europe: Syntaxonomy, management and conservation” instead of “Dry grasslands of Southern Europe with a focus on syntaxonomy, management and conservation”. Therefore, it is logical that the use of a colon or dash to separate a subtitle from a title are beneficial for citation rates via the strong effect on title brevity. However, it remains unclear why the colon has an additional strong positive effect while the dash – despite almost identical usage – has not.
Last but not least, there was the surprising result that my citation impact per article highly significantly decreased over the years in the multiple regression model, while the simple regression suggested no change over time. This is unexpected, as one should assume that in this 20-year period, I should have gained experience and now be able to write articles with higher impact than before. Perhaps I did, but it may be that other scientists improved even faster, and this then is reflected in a decrease in mean FWCI per paper – since FWCI values are normalised to the average in the respective research field and year. However, the absence of a change in the bivariate regression points in another direction: I may have improved various things over time, such as targeting higher-impact journals, shorter titles or more co-authors, but these improvements were accounted for already by the other predictors in the model.
The regression model developed in Table
This estimate helps to explain how different simple choices under my influence as author would have altered the outcome. Originally, I thought of the title “What makes a paper successful?” but abandoned it, when I realised that questions do not improve citation rates but lead to longer titles (in this case: + 1 word). The prediction for this title would be a FWCI of 0.462, i.e. a 8% lower citation rate. If I had chosen to follow the trend to state the main findings in the title, e.g. “Title brevity and article length increase the citation rates of articles”, the predicted FWCI would be 0.275, i.e. 45% lower than for the chosen solution. On the other hand, if I had found three more co-authors or expanded the paper with more content to 18 pages, it would likely get more cited (+48% and +25%, respectively).
Evidently, the strongest limitation of this study is the small sample size of < 200 articles. Thus, this study cannot (and is not intended to) replace a comprehensive analysis with a much broader dataset. However, since the sample covers a relatively wide range of > 50 journals relevant to vegetation ecologists, the findings still can claim some generality. This is particularly true when focussing on the two strongest predictors (those with the lowest p- and highest R2-values) after the journal impact (SNIP), i.e. number of authors and number of title words. Actually, the same two variables had turned out to be highly influential in the same direction in an unpublished study conducted by Meelis Pärtel sometime ago, for all the articles published in Journal of Vegetation Science and Applied Vegetation Science over several years.
Also, the metric of citation impact used here, FWCI, while it was chosen for its obvious advantages over metrics such as the mere citation count, still has limitations. On the Scopus website it is pointed out that the FWCI of an article is less meaningful when its calculation was based on averaging a small group of articles where a single high-impact article could have undue effects. However, this is not the case in the subject areas studied here, each of which is populated by numerous journals, together publishing >> 1000 articles per year. Moreover, the subject area classification by Scopus (ASJC = All Science Journal Classification) as any typology has arbitrary elements. However, these are to some extent levelled out by the fact that most journals are assigned to multiple subject areas; Vegetation Classification and Survey for example to 1110 (“Plant Science”), 1101 (“Agricultural and Biological Sciences (miscellaneous)”) and 1105 (“Ecology, Evolution, Behavior and Systematics”). Evidently, assignment to other subject areas would have led to slightly different FWCI values. However, in the current study this potential bias was counteracted by the fact that the journal SNIP is based on exactly the same subject areas as the FWCI of an article.
This study underlines that trying to get a certain paper accepted in the journal with the highest possible SNIP or JIF will, if successful, on average lead to higher citation rates, as is in agreement with common sense. However, the study also makes clear that the average impact of the journal only determines slightly more than one quarter of the impact of an article, while the latter should be the focus of authors. This means that it could be more efficient for authors to work on the other formal aspects addressed here, which together have more influence on the article impact than the level of the journal has. For example, instead of trying to publish in a journal with a twice as high journal impact (measured as SNIP), they could shorten their title in a meaningful way by 62%, which probably would cost only a small fraction of the time. Likewise, authors should question the current fashion to formulate the main results in the title as a factual statement, as I could show that by itself it is not beneficial for the impact but leads to a much longer title, resulting in a lower impact (e.g. in the example of the previous section: –45%).
Most editors probably would agree that their job is to select those articles that are not only topic-wise but also impact-wise matching the journal, i.e. avoid articles that will become much less cited than the journal average. This study suggests that editors are not very good in this selection as the variation of article impact within the individual journals is extreme (see Suppl. material
I hope that this Forum contribution can raise the awareness among editors that currently they are often not doing a particularly good service to their journals in deciding which manuscripts to accept or reject, at least not from the perspective of scientific impact. I believe that editors could and should be trained much better to forecast the potential scientific impact of submitted manuscripts – which evidently concerns not only the 31.5% of variance explained just by formal issues discussed here, but also the 49.8% of (probably mostly content-wise issues) not addressed here. This refers both to avoid rejections of potential high-impact papers, and acceptances of papers that likely will be much less attractive than average articles in that journal. For example, the article by
Another simple issue that journals could ask themselves is whether the strict upper thresholds for article length defined in many author guidelines are still appropriate, given that longer papers receive significantly more citations after taking all other aspects into consideration. Page limits made sense in the old times when articles were still printed on paper and journal issues sent by mail, i.e. each additional page came with substantial additional costs, but in times of electronic publishing when a few pages more cost hardly anything, this does not appear wise. But of course, editors should only accept longer articles when the additional pages are justified by the content.
This study calls into question several widespread practices of science funders and universities.
In many countries, researchers are strongly pushed to publish their results in “high-rank” journals, often defined as the first and second quarter of JIFs in the Web of Science database. I consider this practice clearly unethical. First, it removes the decision on what is valuable science from scientists and puts it into the hands of a commercial enterprise (Clarivate) and their arbitrary and intransparent decisions as to which journals to include in their database at all.
Second and perhaps more importantly, the variation of citation rates within most (if not all) of the journals is so extreme that it is arbitrary and unfair to assess the impact of an article by the average impact of all articles in that journal. Why should the Nature article by Jandt et al. (
Thirdly, this study calls into question one of the major motivations for the OA movement: to make scientific results better accessible (
Among the national science funders who did and still do push OA publishing massively is the Swiss National Science Foundation (SNSF), which recently started to admit that there are some negative side effects. In consequence of that, they stopped paying OA fees for articles in special features (
It should be highlighted that this whole study became only possible by the Scopus database providing the matching pair of normalised citation indices, both for the journals (SNIP) and the individual articles (FWCI). The normalisation makes studies across subject areas with different citation practices and across years (with different numbers of articles, e.g. the publication peak in the COVID-19 years:
I would like to emphasize that authors, reviewers, editors and science funders should primarily aim for high-quality science. However, I have shown here that the impact of one specific paper is not only defined by its scientific qualities, but to a non-negligible part also by simple formal aspects. As author, it is worth being aware of these mechanisms and take advantage of them to make your own high-quality papers as impactful as they can be. Likewise, reviewers and editors could use this empirical knowledge to give better advice to their authors. I thus hope that this contribution opens a wider discussion on the relevance of formal aspects for the scientific impact of articles. Evidently, this was just an example study based on a small sample from a single vegetation ecologist. However, the results largely coincide with an unpublished study by Meelis Pärtel who several years ago analysed the publication output of Journal of Vegetation Science and Applied Vegetation Science over several years. Hopefully this Forum Paper will spur much more comprehensive follow-up studies across multiple authors and disciplines to test how general the reported patterns are.
All data used are provided in the Supplementary materials.
I would like to thank Meelis Pärtel who, several years ago when he was the Chair of the Chief Editors of Applied Vegetation Science and Journal of Vegetation Science, conducted a similar study on articles published in these two journals. His study came to similar conclusions as this one, but unfortunately was never published. This motivated me to finally get something citable on the topic. Many thanks to François Gillet and Idoia Biurrun who made very useful suggestions to a former versions of the manuscript. Further, I am grateful to Stephen Bell for linguistic revision of this article.
Overview of the 177 articles analysed, broken down to publication venue with journal- and article-based citation metrics and the analysed predictor variables (*.xlsx).
Overview of the 54 journals and two book series included in the analysis, with journal-based and article-based metrics and their relationships (*.xlsx).