Eukaryote Origins

In the last five years, there has been tremendous progress made in understanding the origin of the eukaryote cell.

This progress has been on a number of fronts – phylogenetic, metabolic and bioenergetic and in terms of other data that relate directly to evolution. Our latest paper in PNAS is, I feel, the latest in a line of manuscripts that have tackled this problem from a number of perspectives and have shown overwhelmingly that the best explanation of the data is a chimaeric origin of the eukaryotic cell.

The literature is full of confusion – we counted 72 different theories for the origin of the eukaryotic cell that have been published over the last 100 years. Many of these theories are no more than conjectures – no more important than two people having a chat in a pub and wondering about the origins of the eukaryotic cell.

Some, however, are based on data and consist of well thought-out ideas and arguments.

The main hypotheses that have been addressed in multiple publications (though of course simply counting papers is a fallacy known as argumentum ad populum and doesn’t particularly carry weight) are the autogeneous models, the eukaryotes-early models, the virus-contribution models and the fusion models.

The autogeneous models suggest that Eukaryotes just arose by gradual step-wise evolutionary events from prokaryote cells. In some particular order, the nucleus arose, the mitochondrion endosymbiont event occurred and all the other organelles arose. However, this kind of evolution was gradual and eukaryotes formed progressively over time. The problem with these kinds of models (including the models that propose that Planctomycetes made a big contribution to eukaryote origins – an idea that has no merit whatsoever ( is that there is a huge difference in cell type between prokaryotes and eukaryotes and lots of intermediate forms, that would have arisen over a long period of time must have gone extinct. To my mind at least, the length of time required for such a process would almost certainly have left some intermediate lineages and it hasn’t. In addition, other evidence (see below) mitigates against this idea so strongly that it can be safely discarded. The classic Tree of Life Hypothesis that is seen in many textbooks has, in my opinion, now been thoroughly falsified.

We tested (though it is indeed a weak test) whether the yeast nucleus had more virus-associated genes encoding proteins that are targeted to the nucleus. The idea here being that if viruses did indeed contribute a lot to the origin of the eukaryotic cell, then we might still see today such an involvement (the trace of history might be weak but we might still see such a contribution). Our result? 21% of nuclear proteins are clearly homologous with proteins that are found in viruses today. However, 21% of proteins in the yeast generally are also homologous with proteins seen in viruses today. Therefore, no enrichment of virus proteins in the nucleus. As I say, it is not the strongest test, but if anybody has any other ideas, we are happy to listen.

As for the Eukaryotes-early idea? OK, let me explain this one. Essentially, it has been suggested that a lot of evolution has been reductive. Genomes are constantly being streamlined, shrunk down if possible. This is a good idea, particularly for parasites, but in general it is thought to be some thing useful for single-celled organisms in particular. Therefore, the story goes that it is plausible that Eukaryotes came first and that prokaryotes came later and they are in fact “streamlined” eukaryotes. The section below will explain why I think this does not work as a theory. In my group, we call this the “unbaking the cake model” – a very unparsimonious model that needs to explain a lot more gains and losses than any eukaryotes-late model would have to explain.

Finally, we get to the classes of models that I feel are best supported by the data. These models suggest a chimerism of two different kinds of prokaryote – a eubacterium and an archaebacterium – resulting in a much quicker route to eukaryogenesis and allowing greater amounts of energy to be made in a single cell. This idea is more than 100 years old, dating back to the work of russian scientist Konstantin Mereschkowski (a noted racist and eugenecist, by the way). The genomics era has allowed us to carry out analyses of large gene sets and one of the first analyses was carried out by Christian Esser and co-workers, who noted that the yeast genome was more eubacteria-like than archaebacteria-like, in terms of simple counts of gene numbers that had homology between yeast and prokaryotes. This was a big “WOW” moment for me, because this shouldn’t be the case – the textbooks contain Woese’s Three-domains tree and on this tree Eukaryotes share a more recent common ancestry with Archaea (Archaebacteria). Maybe the three-domains tree is wrong?

Figure 1: Carl Woese’s three-domains hypothesis that I feel has now been falsified. Neither of the two prokaryotic groups are monophyletic and the eukaryotes should have ancestral lineages coming from within each of these two groups.

So, we started investigating this issue and we saw that when we made phylogenetic trees and amalgamated these into supertrees and started looking at the signals in these trees we saw that eukaryotes were generally placed as sister-groups to prokaryotes in three places – as sister-group to the alpha-proteobacteria, as sister-groups to the cyanobacteria and as a group emerging from within the archaebacteria (not as sister group). This meant that the two prokaryotic groups were not monophyletic, they were paraphyletic and only the eukaryotes were monophyletic.

However, big claims need big evidence and they need evidence from all kinds of places – not the same evidence being repackaged in different forms. This is known as consilience and is credited to the 19th Century British Philosopher William Whewell.

So, along with James Cotton, we analysed the yeast genome and partitioned the genes into those with homology to prokaryotes and those with homology to eukaryotes. We asked whether these groups of genes were doing different things, whether they performed different functions. The answer was a clear yes! Archaebacteria genes are more highly expressed on average, more likely to be informational, more central in interaction networks (betweenness centrality and closeness centrality), more likely to be lethal upon deletion and less likely to duplicate. Interestingly, irrespective of whether a gene is informational or operational, if its history is archaebacterial, then it is twice as likely to be lethal upon deletion. This said to us in no uncertain terms that archaebacterial genes were “more important” and eubacterial genes were “less important”. This is too broad a term, really, but it gets across the idea that there are significant differences in the two.

With David Alvarez-Ponce we moved on to looking at Humans – a multicellular organism with lots of experimental data. It turns out that the history of our genes tells us a lot about their roles in humans. Archaebacterial genes evolve more slowly, are more highly- and broadly-expressed, are more likely to be lethal upon deletion in mice (no, we can’t do deletion experiments on humans) and – what I thought was very interesting – they are less likely to be involved in mendelian diseases. Why less likely? Because mutations in these “more important” genes result in non-viable offspring. We tend to see mendelian diseases manifesting more often in genes of eubacterial origin, perhaps because these are genes with lesser effect on human biology.

So, to the paper that went online today. We used gene similarity networks in order to explore eukaryotic origins in a new way. We now have a broad sampling of eukaryotic genomes and this means that we can push back the distance we can look into the past. We are no longer looking at a narrow sampling of eukaryote genomes, we are able to look at a broad swathe of genomes and therefore, we might be able to see what is common in eukaryotes and therefore, what might be ancestral.

We got to see that the eukaryote genome expands a lot and contracts a lot (nothing new), however, this expansion and contraction is really biased towards the eubacterial genes – archaebacterial genes duplicate less often and are lost less often in evolution. We also have introduced a new kind of evolutionary character – the extended homology family. The theory goes like this: If eukaryotes are indeed chimeric, then we might expect to see a eukaryotic gene that is connected to a eubacterial gene, because of recentness of common ancestry and this eubacterial gene might be connected to an archaebacterial gene and this archaebacterial gene might be connected to a different eukaryotic gene (in all cases ‘connected’ means ‘showing significant levels of similarity, i.e. homology’). However, in this chain of homology, we need to see at least 70% of the genes overlapping. If we were to see any such characters it would be because (a) eukaryotes are chimeric and (b) we would be luck to find genes that evolved at the appropriate evolutionary rate. We found a whole bunch of these kinds of genes.

We cannot fit these network patterns onto any theory other than a chimeric theory for eukaryotic origins. It cannot be done. Unless you allow genes to travel through time….backwards.

So, what is left? Wow again. There is still so much to explain – where did the nucleolus come from? The Golgi apparatus? lots of organelles? Mitosis? The more complex cytoskeleton of the eukaryotes (though, thanks to recent work from excellent microbiologists like Jeff Errington, we now know that prokaryotes have complex cytoskeletons too). The list goes on and there is a lot of work to do, however, we now at least have a framework that we can use to ask these questions.

Other papers that are independent from our group and are interesting to read and provide more evidence supporting this scenario include the Cox et al., paper using heterogeneous phylogenetic models to understand the evolution of the archaebacterial ribosomal proteins, transcriptional and replication apparatus and the Lane and Martin paper on the energetics of genome complexity.

Related Posts