GCUA: General Codon Usage Analysis
You can find the project on GitHub here: https://github.com/mcinerneylab/GCUA
This program is designed to perform various tasks that are of use for evaluating codon usage in a set of genes. You can get it to do some simple things like calculate the number of observations of a particular codon in a gene. Or you can do the same thing for the combined dataset. You can also look at amino acid usage frequencies (again for each gene or for the dataset as a whole). The program also produces a distance matrix based on the similarity of codon usage in genes.
However, the most important feature of the program is its ability to use multivariate analysis to look at variation in codon usage amongst genes. Although it is popular to say that an organism has a particular codon usage, it is now known that an organisms genes might have more than one codon usage pattern and it is usually only by using a multivariate analysis method that it is possible to ascertain the kinds of variation contained within the data.
Read in a Fasta-formatted file.
The FastA (or Pearson) format is the simplest method of representing molecular sequence data. On one line is the name of the gene (or organism), preceeded by a ‘>’ (no parentheses). The sequence data begins on the next line. The next sequence is identified by the ‘>’.
If your data is in a different format, you can use Readseq by Don Gilbert to reformat the data into FastA format. Readseq is available from the IUBIO archive: http://iubio.bio.indiana.edu/
Briefly, the format looks like this:
<strong>>MG0001 DNA polymerase III beta sub (dnaN)</strong>
<strong>>MG0002 heat shock protein.</strong>
The program recognises the > sign and decides that this is the start of a new sequence. It does a limited amount of checking of the sequence format, but if you, the user does not ensure that the format requirements (which are not very stringent) are met, it could do some funny things. The limit on sequence length at the moment is 15,000 bp and the limit on number of sequences is 5,000. It is hoped that future development work will mean that there will not be any explicit restriction on sequence number or length.
Calculate codon usage.
The purpose of this function is to calculate the Number (N) of times a particular codon is observed in a gene or set of genes and also to calculate the Relative Synonymous Codon Usage (RSCU) values for the dataset. RSCU values are the number of times a particular codon is observed, relative to the number of times that the codon would be observed in the absence of any codon usage bias. In the absence of any codon usage bias, the RSCU value would be 1.00. A codon that is used less frequently than expected will have a value of less than 1.00 and vice versa for a codon that is used more frequently than expected.
The program adds up the total number of times that the codons for a particular amino acid are observed (depending on the amino acid, this might be 2,3,4 or 6 differnet codons). It then divides this number by the number of codons for the amino acid (2,3,4 or 6), this gives the expected number of times that the codons should be observed. Then for each codon, the frequency of observation is divided by the expected frequency. Sometimes the observed frequency will be greater than the expected frequency (RSCU value greater than 1.00), and sometimes it will be less (RSCU value less than 1.00).
N N RSCU RSCU
UUU 4 60 1.33 1.33
UUC 2 30 0.66 0.66
In gene 1, we observe UUU being used 3 times and UUC being used twice. In gene 2 we observe UUU being used 60 times and UUC being used 30 times. Although these figures are vastly different, when we convert them to RSCU values, it is obvious that in both genes the UUU codon is ‘preferred’ over the UUC codon and the level of preference appears to be the same in both genes.
The calculation is as follows:
For each codon - N * tot / # of codons for that AA
Gene 1 UUU - ((4/6)x2) = 1.33
Gene 1 UUC - ((2/6)x2) = 0.66
Gene 2 UUU - ((60/90)x2) = 1.33
Gene 2 UUC - ((30/90)x2) = 0.66
Converting codon usage values to RSCU values has the effect of ‘normalising’ comparisons across genes. It makes the codon usage value independent of amino acid composition of the sequences and identifies when a codon is being used more frequently than expected and when it is being used less frequently than expected.
The Effective Number of codons measure is a way of analysing how biased a gene is in terms of its codon usage. Nc values range from 61 for a gene that tends to use all codons with equal frequency to 20 for a gene that is effectively using only a single codon for each amino acid.
There is a relationship between Nc and the base composition of a gene with genes that have more biased base compositions being expected to have lower Nc values. What is usually of interest are Nc values that are lower than might be dictated by the base composition of the gene. This might be taken as evidence that there is some kind of selective pressure on the gene to use a smaller subset of codons. This selective pressure could be translational selection for ‘optimal’ codons. Optimal codons are those that correspond to the major abundance tRNA for that amino acid. In such circumstances, there could be a selective pressure to use a particular codon that corresponds to this tRNA.
Calculate Amino Acid Usage
Amino acid usage varies substantially between protein sequences, however there are some amino acids that are usually used reasonably frequently and some that are used very rarely (Cysteine, for instance).
It may be of interest to know the frequency with which amino acids are used in a protein sequence. The output from GCUA gives the percentage usage of each amino acid in a protein sequence. These data can be imported into a spreadsheet package for further analysis or (as might be preferred) the results can be viewed by eye.
The similarity of amino acid usage is also one of the distances that can be produced by the program. This means that amino acid usage can be visualised on a distance tree. The similarities of amino acid usage can be used as input for either neighbor-joining or some objective function (least-squares or Minimum Evolution), so that a dendrogram can be produced that will give some graphical view of the closeness of genes in terms of their amino acid usage.
The most sophisticated method of looking at amino acid usage is to employ multivariate analysis to identify the trends in amino acid usage among genes. In GCUA there is also an option of feeding amino acid usage data into the multivariate analysis functions. It may be more profitable, however to use the ADE package, which is much more sophisticated. (http://pbil.univ-lyon1.fr)
If you wish to get more sophisticated, you can perform multivariate analyses on the dataset. You might not be familiar with multivariate analyses so here is a little explanation, without much math.
What is the objective of the multivariate analyses? We need to find out what is the biggest source of variation in the dataset.
What does this mean? It means that often there is a subtle (or not so subtle) force affecting codon usage (or amino acid usage) and if we use some appropriate mathematics we might be able to find out what it is.
So, what happens during multivariate analyses? Even though my description will deal with codon usage data the principle is the same for many kinds of data.
Each gene is represented by a set of co-ordinates. These are the codon usage statistics for each codon (in fact we use the RSCU values, which are described later in this document). For the universal genetic code, the gene is represented by 59 co-ordinates (each of the 59 codons for which there is a synonymous alternative), but this figure varies, depending on the genetic code that is being used. We need to find out if there is no progression in the magnitude of these co-ordinates or if there is!
Lets take, for instance the amino acid Isoleucine. I’m sure that you all know that it is usually encoded by three codons, namely AUU, AUC and AUA. Lets pretend for a minute that a gene is encoded only by three codons, i.e. the ones above. The position of the gene can be plotted in a high-dimensional space by reference to the RSCU values of each of these three codons.
What is this high-dimensional space? You can all probably imagine an X and a Y axis (looks like a cross). With a little more imagination, you can probably get as far as a diagram with an X, Y and Z axis (the Z axis is perpendicular to the page). The position of the gene on this space, is given by its codon count values. Suppose that these values are 75, 125, 100 for each of the codons respectively. So, on the UUU axis we go out 75 units from the origin, on the UUC axis we move out 125 units from the origin and on the UUA axis (the one perpendicular to the page) we go out 100 units. We now have a point in space, that is not on any of the axes, that gives the position of the gene. (I could have described all of this in terms of vectors and so on, and if you want the mathematical description, there is some good literature referenced at the end of this document.) If we take all of the genes for a particular organism and plot them on this high-dimensional space according to their codon usage statistics, we get an effect that looks like a cloud (hope you are still with me). In the absence of any codon usage bias, this ‘cloud’ would look spherical. The points should be dotted around the axes in a random way, with no general direction. However, if there is some factor influencing codon selection on some genes, then the ‘cloud’ will no longer be spherical, but will have some kind of skewness. It will no longer look like a sphere, perhaps more like a rugby ball (American football, straight banana, sausage…get the picture?). This is a rough analogy.
We have pretended that genes are made up of three codons. Well, in fact the analogy is the same when dealing with real data. We are dealing with 59 codons for the universal code, 61 for the Mycoplasma/Spiroplasma code and so on. The axes are all considered to be orthogonal, but my imagination runs out after three axes.
The problem resides in finding an axis on which we can see the most variation (spread of the points). This axis accounts for the greatest amount of variation in codon usage in the dataset. The magnitude of the spread of points on this axis (the size of the eigenvalue) can also give an indication of whether or not it really is an important trend or whether it just happens to be the axis with the biggest spread of points.
Fortunately, for us biologists (or me, at least), the mathematics of all of this kind of analysis was worked out by really smart mathematicians. The references at the end of the document give a very detailed account of how all of this is carried out. In particular the book by Michael Greenacre is excellent.
The options in GCUA allow you to perform four kinds of multivariate analyses. Historically, multivariate analyses were carried out on RSCU values. However, the correct analysis is probably either a correspondance analysis of raw codon counts or a principal components analysis of RSCU values (the results are usually quite similar if there are strong trends in the dataset). I have implemented (for good or evil), a number of methods. Three of them can be considered principle components analyses and the other is correspondence analysis.
How do I know when to use one or the other? Principle components analyses (PCA) are designed to analyse quantitative data, whereas correspondence analysis (CA) is designed to analyse frequency data (how often was leucine used?), contingency tables (if there is a choice of answering ‘yes’ or ‘no’ to a question, then if 60% of the answers are ‘yes’, 40% MUST be ‘no’, so there isn’t independence among entries), probabilities, categorical and mixed qualitative data. Have a look at:
http://www.astro.unibas.ch:80/~midas/doc/95nov/vol2/node211.html for some more information.
It is worth having a look at a number of methods to see if anything might appear using one method and not another. Anyway, it is often nice if an effect is discovered by more than one method. See Greenacre’s book (Greenacre 1984) for a good treatment of the subject.
Codon usage variation
Codon usage variation is a very well known phenomenon and has been studied in a wide diversity of organisms. Virtually every codon has been shown to be preferentially used in some organisms and seldomly used in others. The causes of codon usage variation are many-fold.
Of course, mutational bias (the tendency displayed by some organisms to have unbalanced base composition) is frequently a contributing factor. Some organisms have extremes of base composition and this can influence the selection of codons. In fact there is a mathematical expression directly relating the extent of mutational bias with the effective number of codons used by a gene (Wright 1990) .
It has been shown that the pattern of codon usage in the highly expressed genes of Escherichia coli and Saccharomyces cerevisiae correlates very strongly with the known abundances of the iso-accepting transfer-RNAs (tRNAs) (Ikemura 1981; Bennetzen and Hall 1982; Ikemura 1982; Sharp and Cowe 1991) . The advantage of this system (translational selection) is self-evident- using a codon for which there is an abundant cognate tRNA can speed up the process of mRNA translation.
There is often a mutation-selection balance in operation that shapes the overall frequency with which each codon is used (for review, see Sharp et. al. 1993). The selective difference between using a codon that will expedite translation and one that will not, is thought to be very small. Selection for a ‘preferred’ codon therefore, will only happen in a situation where random genetic drift is small. This means that translational selection will only be observed in species where the effective population size is large enough for the effects of random genetic drift to be subdued (Muto and Osawa, 1987). Even for organisms with large effective population sizes, not all genes acquire a rich supply of ‘preferred’ codons. A gene whose product is required in large amounts or quickly at particular times, can exert a greater amount of selective pressure than a gene whose product is not required in large amounts. The situation is seen in many prokaryotes and yeast where highly expressed genes use a small subset of the 59 available synonymously degenerate codons in their highly expressed genes whereas the lowly expressed genes use codons in a less restricted fashion (Sharp et al. 1986; Sharp and Devine 1989; Lloyd and Sharp 1991; Lloyd and Sharp 1993) .
The story is much different with some large eukaryotes. These organisms have small effective population sizes. We know from our population genetics that in species with small effective population sizes only traits that confer really strong selective advantages will become fixed in the population. Small selective pressures find it more difficult to survive in populations that are subject to little extinctions and lots of genetic drift. And this is the fate that preferred codons have to suffer in big hairy animals (and their relatives). So, the codon usage of a gene in a species with a small effective population size is merely a reflection of the mutational pressure that is experienced in the region of the genome in which it is found. In some vertebrates, there is an isochore organisation (Bernardi 1993) of the genome, so one species can have two different codon usages. One set of codon usage statistics are only relevant in the regions that are high G+C and the other set of codon usage statistics are only relevant in regions with low G+C base composition.
In many microbial genomes there is evidence for different base compositions on different strands (strand-specific skewness). At its most extreme each strand has its own codon usage pattern and often the difference in codon usages between strands is significant for many of the codons (McInerney, 1998; Lafay et al., 1999). Other organisms, such asMycoplasma genitalium has a wave of base composition change and consequently codon usage change along its chromosome (McInerney, 1997; Kerr et al., 1997).
Another, relatively recently-discovered phenomenon is the presence of strand-specific mutational biases. This can often result in significantly-different codon usage pattern on different strands (McInerney, 1998).
It might be a profitable exercise to examine the similarities of a set of genes in terms of their similarity of codon or amino acid usage.
For the purposes of phylogeny reconstruction (constructing hypotheses of relationships between molecular sequences) it is often necesary to pose the question of whether or not the inferred relationships are an accurate reflection of the natural history of the sequences or if in fact, some kind of systematic bias or systematic error is contributing to tree topology.
If two sequences have a similar base or amino acid composition, then they might appear as sister taxa on a phylogenetic tree irrespective of whether or not they share a common ancestor that is exclusive of the other taxa. This situation is most acutely problematic when the two sequences in question have a biased composition and this composition is different to the composition in the other sequences.
In microbial ribosomal RNA sequences, there is a reasonable correlation between base composition and the optimal growth temperature of the organism. Organisms that have a higher growth temperature have more G and C residues in their sequences. This can result in clustering of thermophiles in phylogenetic trees derived from this gene: thermophilic convergence.
It can be extraordinarily difficult to say with absolute certainty whether or not sequences are clustering on the basis of residue composition or recentness of common ancestry, however for protein-coding DNA sequences, GCUA will at least allow you to construct trees on the basis of the residue composition of the sequences (either codon or amino acid).
Figure 1 shows the result of drawing a tree based on codon usage similarities. In this case the gene is the Elongation factor 1-alpha. There are three Archaeal sequences and the rest are Eukaryal sequences. On the basis of similarities in codon usage the three Archaeal sequences (in boldface) do not form a monophyletic group (and obviously the Eukaryal sequences also do not form a monophyletic group).
A phylogenetic tree inferred from the actual sequences will result in the Archaeal sequences forming a monophyletic group and we can be confident that this hypothesis of relationships is not being constructed because of problematic composition effects.
Figure 1. A dendrogram constructed using the neighbor-joining algorithm, with distances based on codon usage similarities being used as input. The tree was constructed using the PAUP*4.0 software program.
The second figure is an analysis of all the genes in the bacterium Mycoplasma genitalium. This is the smallest-known autonomously-replicating genome. A distance matrix was produced from all the sequences in the genome based on the similarity of amino acid usage in each protein.
The ribosomal proteins were identified and are highlighted in this diagram by the red branches. Internal branches that have exclusively ribosomal protein daughter nodes are also highlighted in red.
In this case we can see that the ribosomal proteins fall into a limited number of clusters of “composition clades”. I think this might be of some use.
The manuscript you should cite is:
McInerney, J.O. (1998). GCUA (General Codon Usage Analysis). Bioinformatics: 14 (4) 372-373. [pdf]
Andersson, S. G. E. and Kurland, C. G. (1990). “Codon preferences in free-living microorganisms.” Microbiological Reviews 54(2): 198-210.
Andersson, S. G. E. and Sharp, P. M. (1996). “Codon usage and base composition in Rickettsia prowazekii.” J. Mol. Evol. 42: 525-536.
Bennetzen, J. L. and Hall, B. D. (1982). “Codon selection in yeast.” J. Biol. Chem. 257: 3026-3031.
Bernardi, G. (1993). “The isochore organisation of the human genome and its evolutionary history-a review.” Gene 135: 57-66.
Gouy, M. and Gaultier, C. (1982). “Codon usage in bacteria: correlation with gene expressivity.” Nuc. Acids Res. 10(22): 7055-7075.
Greenacre, M. J. (1984). Theory and applications of correspondence analysis. London, Academic Press.
Ikemura, T. (1981). “Correlation between the abundance of Escherichia coli transfer RNAs and the occurance of the respective codons in its protein sequence: a proposal for a synonymous codon choice that is optimal for the E. coli translational system.” J. Mol. Biol. 151: 389-409.
Ikemura, T. (1982). “Correlation between the abundance of yeast transfer RNAs and the occurence of the respective codons in protein genes.” J. Mol. Biol. 158: 573-597.
Ikemura, T. (1985). “Codon usage and tRNA content in unicellular and multicellular organisms.” Mol. Biol. Evol. 2: 13-34.
Ikemura, T. (1985). Codon usage, tRNA content and rate of synonymous substitution. Population genetics and molecular evolution Ed.^Eds. T. Ohta and K. Aoki. Berlin, Springer-Verlag. 385-406.
Ikemura, T. and Aota, S.-I. (1988). “Global variation in G+C content along vertebrate genomic DNA.” J. Mol. Biol. 203: 1-13.
Kerr, A.R., J.F. Peden, and P.M. Sharp. “Systematic base composition variation around the genome of Mycoplasma genitalium, but not Mycoplasma pneumoniae.” Molecular Microbiology 25 (6 1997): 1177-1181.
Lafay, B., A.T. Lloyd, M.J. McLean, K.M. Devine, P.M. Sharp, and K.H. Wolfe. “Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases.” Nucleic Acids Research 27 (7 1999): 1642-1649.
Lloyd, A. T. and Sharp, P. M. (1991). “Codon usage in Aspergillus nidulans.” Mol. Gen. Genet. 230: 288-294.
Lloyd, A. T. and Sharp, P. M. (1993). “Synonymous codon usage in Kluyveromyces lactis.” Yeast 9: 1219-1228.
Lobry, J.R. “Properties of a general model of DNA evolution under no-strand-bias conditions.” J. Mol. Evol. 40 (1995): 326-330.
Lobry, J.R. “Asymetric substitution patterns in the two DNA strands of bacteria.” Mol. Biol. Evol. 13 (5 1996a): 660-665.
Lobry, J.R. “Origin of replication of Mycoplasma genitalium.” Science 272 (1996b): 745-746.
Lobry, J.R. “A simple vectorial representation of DNA sequences for the detection of replication origins in bacteria.” Biochimie 78 (1996c): 323-326.
Lobry, J.R. “Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species.” Gene 205 (1997): 309-316.
Lobry, J.R. and C. Gautier. “Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes.” Nuc. Acids Res. 22 (15 1994): 3174-3180.
McInerney, J.O. Prokaryotic genome evolution as assessed by multivariate analysis of codon usage patterns. Microb. Compar. Genomics 2 (1 1997): 1-10.
McInerney, J.O Replicational and transcriptional selection on codon usage in Borrelia burgdorferi. Proc. Natl. Acad. Sci. USA 95 (1998): 10698-10703.
Sharp, P. M. (1985). “Does the noncoding strand code?” Nucleic Acids Res. 13: 1389-1397.
Sharp, P. M. (1989). Evolution at ‘silent’ sites in DNA. Evolution and animal breeding Ed.^Eds. W. G. Hill and T. F. C. Mackay. Wallingford, UK, CAB International. 23-32.
Sharp, P. M. (1990). “Processes of genome evolution reflected by base frequency differences among Serratia marcessans genes.” Molecular Microbiology 4(1): 119-122.
Sharp, P. M., Burgess, C. J., Lloyd, A. T. and Mitchell, K. J. (????). Selective use of termination codons and variations in codon choice. Transfer RNA in protein synthesis Ed.^Eds. D. L. Hatfield, B. J. Lee and R. M. Pirtle. CRC press. 397-425.
Sharp, P. M. and Cowe, E. (1991). “Synonymous codon usage in Saccharomyces cerevisiae.” yeast 7: 657-678.
Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H. and Wright, F. (1988). “Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity.” Nuc. Acids Res. 16(17): 8207-8211.
Sharp, P. M. and Devine, K. M. (1989). “Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do ‘prefer’ optimal codons.” Nuc. Acids Res. 17(13): 5029-5039.
Sharp, P. M. and Li, W.-H. (1987). “The codon adaptation index – a measure of directional synonymous codon usage bias, and its potential applications.” Nuc. Acids Res. 15(3): ????-????
Sharp, P. M. and Li, W.-H. (1987). “The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias.” Mol. Biol. Evol. 4: 222-230.
Sharp, P. M. and Li., W.-H. (1986). “An evolutionary perspective on synonymous codon usage in unicellular organisms.” J. Mol. Evol. 24: 28-38.
Sharp, P. M. and Lloyd, A. T. (1993). Codon usage. An atlas of Drosophila genes Ed.^Eds. G. Maroni. New York, Oxford, Oxford University Press. 378-397.
Sharp, P. M. and Lloyd, A. T. (1993). “Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure.” Nucleic Acids Res. 21: 179-183.
Sharp, P. M., Shields, D. C., Wolfe, K. H. and Li, W.-H. (1989). “Chromosomal Location and Evolutionary Rate Variation in Enterobacterial Genes.” Science 246: 808-810.
Sharp, P. M., Shields, D. C., Wolfe, K. H. and Li, W.-H. (1989). “Chromosomal Location and Evolutionary Rate Variation in Enterobacterial Genes.” Science 246: 808-810.
Sharp, P. M., Stenico, M., Peden, J. F. and Lloyd, A. T. (1993). “Codon usage: mutational bias, translational selection or both?” Biochem. Soc. Trans. 21: 835-841.
Sharp, P. M., Tuohy, T. M. F. and Mosurski, K. R. (1986). “Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes.” Nuc. Acids Res. 14(13): 5125-5143.
Shields, D. C. (1990). “Switches in species-specific codon preferences: The influence of mutation biases.” J. Mol. Evol. 31: 71-80.
Shields, D. C. and Sharp, P. M. (1987). “Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases.” Nuc. Acids Res. 15(19): 8023-8040.
Wright, F. (1990). “The ‘effective number of codons’ used in a gene”. Gene 87: 23-29.
Some publications that have recently cited GCUA:
The complete genome sequence of a novel T4-like bacteriophage, IME08 Author(s): Jiang Huanhuan; Jiang Xiaofang; Wang Sheng; et al.Source: ARCHIVES OF VIROLOGY Volume: 156 Issue: 8 Pages: 1489-1492 DOI: 10.1007/s00705-011-1033-9 Published: AUG 2011
Comparative analysis and supragenome modeling of twelve Moraxella catarrhalis clinical isolates Author(s): Davie Jeremiah J.; Earl Josh; de Vries Stefan P. W.; et al.Source: BMC GENOMICS Volume: 12 Article Number: 70 DOI: 10.1186/1471-2164-12-70 Published: JAN 26 2011
Codon Usage Biases in Alzheimer’s Disease and Other Neurodegenerative Diseases Author(s): Yang Jie; Zhu Tong-Yang; Jiang Zheng-Xin; et al.Source: PROTEIN AND PEPTIDE LETTERS Volume: 17 Issue: 5 Pages: 630-645 Published: MAY 2010
Survey of transcripts expressed by the invasive juvenile stage of the liver fluke Fasciola hepatica Author(s): Cancela Martin; Ruetalo Natalia; Dell’Oca Nicolas; et al.Source: BMC GENOMICS Volume: 11 Article Number: 227 DOI: 10.1186/1471-2164-11-227 Published: APR 7 2010
Modal Codon Usage: Assessing the Typical Codon Usage of a Genome Author(s): Davis James J.; Olsen Gary J.Source: MOLECULAR BIOLOGY AND EVOLUTION Volume: 27 Issue: 4 Pages: 800-810 DOI: 10.1093/molbev/msp281 Published: APR 2010
Extensive evolutionary rate variation in floral color determining genes in the genus Ipomoea Author(s): Toleno Donna M.; Durbin Mary L.; Lundy Karen E.; et al.Source: PLANT SPECIES BIOLOGY Volume: 25 Issue: 1 Pages: 30-42 DOI: 10.1111/j.1442-1984.2009.00256.x Published: APR 2010
Codon usage bias in herpesvirus Author(s): Fu MinghuiSource: ARCHIVES OF VIROLOGY Volume: 155 Issue: 3 Pages: 391-396 DOI: 10.1007/s00705-010-0597-0 Published: MAR 2010
A detailed comparative analysis on the overall codon usage pattern in herpesviruses Author(s): RoyChoudhury Sourav; Mukherjee DebaprasadSource: VIRUS RESEARCH Volume: 148 Issue: 1-2 Pages: 31-43 DOI: 10.1016/j.virusres.2009.11.018 Published: MAR 2010
A Second Actin-Like MamK Protein in Magnetospirillum magneticum AMB-1 Encoded Outside the Genomic Magnetosome Island Author(s): Rioux Jean-Baptiste; Philippe Nadege; Pereira Sandrine; et al.Source: PLOS ONE Volume: 5 Issue: 2 Article Number: e9151 DOI: 10.1371/journal.pone.0009151 Published: FEB 10 2010
AntigenDB: an immunoinformatics database of pathogen antigens Author(s): Ansari Hifzur Rahman; Flower Darren R.; Raghava G. P. S.Source: NUCLEIC ACIDS RESEARCH Volume: 38 Supplement: 1 Pages: D847-D853 DOI: 10.1093/nar/gkp830 Published: JAN 2010