top of page
  • Paolo

Aggiornamento: 7 nov 2020


Consider a biological sample, for example pollen on an insect body. If you aim at sorting the pollen’s species according to their abundance, you will inevitably find that some pollen is super-dominant (i.e. very abundant) and other pollen is extremely rare, with a gradient in between. So far so good, but if you calculate any diversity statistics, the chosen metric will be influenced by the very abundant taxa which will shadow the other diversity. There are many ways to possible handle or correct for this, but I will not deal with it now. Another face of the problem comes from species that are very rare in the samples. Are these less abundant ones biologically reliable? Say, a bee collected two pollen grains from plant #1 and 100 from plant#2, could we confidently say the bee collects heterogeneous resources? Or would rather be better to ignore the 2 grains of species #1 (that resulted of stochastic event? Or from cross contamination between flowers?)? I would go for excluding the two grains… but the same issue arises from DNA data. In recent times, pollen have been identified with DNA metabarcoding, probably the best way to identify at species level given the difficulties at sorting species of pollen morphologically. The output of these type of data are matrices where sequenced pollen species are associated to a given number of DNA sequencing “reads” for each sample, that are quantitative numbers saying how much DNA you got from your DNA analysis protocol. Well, again, some species will be very rare, others will not be. How to behave? Some suggestions would be to remove everything below 1% of the reads abundance, while others would suggest to use the number of reads resulted from sequencing blanks (that are empty vials put in the machine to calculate possible machine-related contamination). While the second option seems very reasonable, it ignores the differences between species in amplification procedure: you could get few reads because a species has very thick pollen walls and hardly released any DNA; if so, should it be excluded because it resulted less abundant than a blank? Maybe not. On the other hand, using a fixed 1% threshold ignores the read counts distribution in the samples: say you have one species with 98% of the pollen and three species with about 0.66% of the reads, should we exclude the three? With such skewed distribution using fixed threshold is very risky. That is why I usually prefer using a ROC threshold calculation procedure. It is very simple but robust and everyone loves it, yet it is hardly ever used in ecological or DNA data.

The procedure consists in statistically estimating the rate of false and true positives and negatives for each sample by associating the number of reads to these categories, and thus obtaining a sample-dependent threshold based on how the reads are actually distributed in your sample (b.t.w., you can also obtain a cross-samples threshold if you prefer having a constant one). I used this approach in both a pollen based study (see here!) and a microbe-based study (in revision), and the results always make more sense to me than using other cleaning approaches (the above mentioned ones) that basically clean too much, or not cleaning at all, where you risk to have too much diversity involved and you would dilute the important information in a mess of data. However, the possible solution is to try different approaches and decide according to the data you have, because sometimes even ROC cleans “too much”.


The figures are about pollen (the first) and a reconstructed DNA molecule (the second), are taken from Pxfuel and hold a CC licence.

29 visualizzazioni0 commenti
  • Paolo

Aggiornamento: 7 nov 2020

They are just bees, they all the same… nothing more wrong than that. A large number of bees has been described so far as belonging to different species, with large differences in morphology, nesting, foraging for acquiring resources and so on. On the other hand, even within a single species, some degree of (reasonably small) variation in morphology, behaviour, ecology and in genetics can take place.

The photo is of Andrena praecox, the species this post refers to
The studied bee

This is true not only for bees and, for example, consider humans, as different people could have different eye shape, or people of an ordinary village could be sorted into diverse family trees, although these people all belong to the same entity of being humans. In a recent study, some colleagues and I have discovered that the populations of a bee species from Europe, quite widespread all over the continent, and with negligible morphological variation (for what we know), is actually structured in a western and an eastern genetic lineage. This means that at some point during the history of the bee and of the continent, a common entity ended up as separated in two spatially isolated clusters. This isolation allowed some genetic difference between these clusters to sediment in the DNA, and nowadays the apparently continuous bee range is actually divided into two adjacent genetic sections. This is quite surprising, because what superficially seemed homogeneous, in fact was an heterogenous structure. Here is the link the published study this blog is reffering to: http://fragmentaentomol.org/index.php/fragmenta/article/view/414. Similar studies have piled up over time, especially in the last 20 years, with case studies from other animals and plants.

Although not discussed in the bee study itself, the authors and I have later brainstormed that the root factor causing the observed genetic structuring of this bee might be connected with the turbulent climatic history of Europe. During the most recent ice ages, the cooling of most Europe lead organisms to seek refugium in southern Europe, where populations remained isolated by the E-W distribution of mountains that worked as geographical barriers. This likely contributed to what we have observed in the bee we studied.

It would be very interisting to know if other bee species hide patterns of variation across their range.

54 visualizzazioni0 commenti
  • Paolo

Aggiornamento: 7 nov 2020


A study that some colleagues and I published in 2019 is currently listed in the top 100 Ecology papers published by Scientific Reports that year (ranked 54th out of 550 ones based on download rates). This rank is quite remarkable and the idea of this study came from a visit to an acquaintance. It was 2014 when on a trip to UK for attending a conference, Jan (my PhD supervisor that time) and I stopped at Jeff Ollerton’s lab in Northampton for a hello. We chatted about my PhD degree about to start in Czech Republic and set a meeting for a brainstorming. This event provided us an exciting working hypothesis about plants and pollinator interactions. Once back to Czech Republic, we decided to test how the pollinator guild would react to the sudden loss of the most favored plant species. We choose to test this with field manipulations, rather than by computer simulations. In particular, we planned to remove 4 plant species from natural communities, one plant species at a time, based on the amount of pollinators they previously received. We discovered a few but important aspects of how a pollinator guild depends on the scenario set by the plant assemblage. We found that knocking out the most visited plants caused sudden decreases of pollinator abundances. On top of that, we discovered that those pollinators who did not disappear after plant removal, redirected the visitation towards particular flowers instead of redistributing equally to the remaining plants. Specifically, our data suggested that flowers that were smaller than or with more sugar in the nectar than the removed flowers received more visitors.

These findings supported the idea that generalist flowers (i.e. higly visited) play a key role in sustaining local pollinator abundances, and that pollinator can somehow find alternative resources but according to specific flower features. It was an exciting study, which helped understanding how pollinators redirect their foraging choices after perturbations and that could also bear some implications even for ecosystem conservation. For a full view, the study is here: https://www.nature.com/articles/s41598-019-43553-4.


We did also explore what impacts this removal caused on the structures of plant-pollinator interaction networks, but I will not reveal too much about that now (but see here: https://www.biorxiv.org/content/10.1101/2020.01.28.923177v1)

173 visualizzazioni0 commenti
bottom of page