Why are Monotreme unique mammals

The genome analysis of the platypus reveals unique signatures of evolution

  • A corrigendum to this article was published on September 11, 2008

abstract

We present a draft genomic sequence of the platypus Ornithorhynchus anatinus. This monotrem shows a fascinating combination of reptile and mammalian character. For example, platypuses have fur that is adapted to an aquatic lifestyle; Platypus females lactate but lay eggs; and males are endowed with poison similar to that of reptiles. Analysis of the first monotremgenome agreed these traits with genetic innovations. We find that reptile and platypus venom proteins were independently co-opted from the same gene families; Milk protein genes are retained by platypus despite oviposition; and extensions of the immune family are directly related to platypus biology. Extensions of protein, non-protein-coding RNA and microRNA families as well as repeat elements are identified. The sequencing of this genome now offers a valuable resource for comparative analyzes in the depths of mammals as well as for monotrembiology and conservation.

Main

The platypus (Ornithorhynchus anatinus) has always caused excitement and controversy in the zoological world 1 . Some initially thought it was a real mammal, despite its duckbills and webbed feet. The platypus was placed together with the Echidnas in a new taxon called Monotremata (which means "one hole" because of the common external opening for the urogenital and digestive systems). Traditionally, the Monotremata are regarded as belonging to the mammalian subclass Prototheria, which deviated from the Therapsid line leading to the Theria and then split into marsupials (Marsupialia) and Eutherians (Placentalia). The divergence of monotrems and therians falls within the large gap in amniote phylogeny between the eutherian radiation about 90 million years ago (Myr) and the divergence of mammals from the sauropsid line about 315 Myr ago (Fig. 1). Estimates of the Monotrem Theria divergence time are between 160 and 210 myr; here we will be using 166 myr before, recently obtained from fossil and molecular data 2 were appreciated.

Amniotes divide into sauropsids (which lead to birds and reptiles) and synapsids (which lead to mammal-like reptiles). These small early mammals developed hair, homeothermia, and lactation (red lines). Monotreme broke away from the Therian mammal line ∼ 166 Myr 2 and developed a unique string (dark red text). Therian mammals with common characters, divided into marsupials and Eutherians, before about 148 Myr 2 (dark red text). Geological epochs and periods with relative times (before Myr) are indicated on the left. Mammalian lines are red; diapside reptiles, depicted as archosaurs (birds, crocodiles, and dinosaurs) are blue; and lepidosaurs (snakes, lizards, and relatives) are green.

Full size image

The most extraordinary and controversial aspect of platypus biology was initially whether or not they lay eggs like birds and reptiles. In 1884 William Caldwell's short telegram to the British association announced "Monotreme ovipar, ovum meroblastic", not holoblastic as in the other two groups of mammals 3, 4 . The egg is laid in a ditch after about 21 days and hatches 11 days later 5, 6 . When most of the organ systems differ, the young are dependent on milk, which is sucked directly from the abdominal skin, for about 4 months, since the females have no nipples. Platypus milk changes protein composition during breastfeeding (like marsupials, but not most Eutherians 5 ). The anatomy of the monotremer reproductive system reflects its reptilian origin, but shows characteristics typical of mammals 7 as well as unique special features. Spermatozoa, like those of birds and reptiles, are thread-like, but, unique among amniotes, form bundles of 100 when passing through the epididymis. Chromosomes are in sperm 8 arranged in a defined order as in Therians, but not in birds 9 . The testes synthesize testosterone and dihydrotestosterone, as in Therians, but there is no scrotum and the testes are abdominal 10 .

Other peculiarities of the platypus are the gastrointestinal system, neuroanatomy (electrical reception) and a poison delivery system that is unique in mammals 11 . Platypus is a mandatory water eater that relies on its thick pelage to maintain its low body temperature (31–32 ° C) while feeding in often icy waters. With eyes, ears and nostrils closed when fishing underwater, an electro-sensory system is used to locate aquatic invertebrates and other prey. 12, 13 . Interestingly, adult monotremes lack teeth.

The platypus genome, like the animal, is an amalgam of reptiles and derived mammals. The Platypus karyotype has 52 chromosomes of both sexes 14, 15with a few large and many small chromosomes reminiscent of reptilian macro and microchromosomes. Platypuses have multiple sex chromosomes with some homology to the avian Z chromosome 16 . Males have five X and five Y chromosomes, which form a chain in meiosis and transform into 5X and 5Y sperm 17, 18 split . Sex determination and compensation of the sex chromosome dose remain unclear.

Platypuses live in the waterways of eastern and southern Australia, including Tasmania. His secret lifestyle hinders understanding of his population dynamics and social and family structure. Platypuses are still relatively common in the wild, but have recently been classified as "vulnerable" due to their reliance on an aquatic environment that is negatively impacted by climate change and the damage caused by human activities. Water quality, erosion, the destruction of habitats and food as well as diseases threaten the population. Since the platypus has rarely been bred in captivity and is the last in a long line of ornithorhynchid monotremes, its continued survival is of great importance. Here we describe the Platypus genome sequence and compare it to the genomes of other mammals and the chicken.

Sequencing and assembly

All sequencing libraries were made from DNA from a single female platypus (Glennie; Glenrock Station, New South Wales, Australia) and sequenced using established Whole-Genome Shotgun (WGS) techniques 19 . An assembly design was made from a 6-fold coverage of whole genome plasmid, fosmid and bacterial artificial chromosome (BAC) readings (Supplementary Table 1) using the assembly program PCAP 20 (Supplementary Notes 1). In parallel to the sequence assembly, a BAC-based physical card was developed and then integrated into the WGS assembly in order to primarily divide the assembly into ordered and oriented groupings (Ultracontigs; Supplementary Notes 2 and 3 and Supplementary Table 2). Since no Platypus coupling cards were available, we used fluorescence in situ hybridization (FISH) to create a subset of the sequence frameworks according to the agreed nomenclature 21 localize on chromosomes. Of the 1.84 gigabases (GB) of the composite sequence, 437 megabases (MB) were ordered and aligned along 20 of the platypus chromosomes. We have analyzed numerous metrics for the assembly quality (Supplementary Notes 4–11) and have come to the conclusion that the existing platypus arrangement is an appropriate substrate for the analyzes presented here despite the disadvantageous contiguity due to its structural and nucleotide accuracy.

Non-Protein Coding Genes

In general, the Platypus genome contains fewer computationally predicted non-protein-coding (nc) RNAs (1,220 cases without highly repetitive small nucleolar RNA (snoRNA) copies; see below) than other mammalian species (e.g. humans with 4,421 Rfam hits)), similar to chicken 19 (655 Rfam-based ncRNAs). This is likely due to the extensive retrotransposition of ncRNAs in ethereal mammals and the apparent lack of L1-mediated retrotransposition in chickens and platypus. The exception is the Platypus family of snoRNAs, which is significantly expanded compared to the Rfam covariant models (∼ 200) (∼ 2,000 matches to the Rfam covariant models). snoRNAs are involved in RNA modifications, especially of ribosomal RNA, and are often located in introns of protein-coding genes 22 . Our investigations revealed a novel snoRNA-like retrotransposon (short-interspersed element, SINE), which we have designated as snoRTEs, which has been duplicated in platypus to 40,000 complete or shortened copies. It is performed using retrotransposon-like transposable non-LTR (Long Terminal Repeat) (RTE) elements, in contrast to the L1-mediated transposition mechanism in 23 retrotransposed. We constructed a complementary DNA library of small ncRNAs and identified 371 consensus sequences of small RNAs, the 166 snoRNAs 23 included (Supplementary Table 3). Ninety-nine of these cloned snoRNAs are found in paralog families, and 21 of them belong to the snoRTE class. The presence of both the structural requirements that are known to be essential for snoRNA function 24 are important and the evidence of their expression agree that these snoRTE elements are functional in the platypus. Similar to other unrelated ncRNAs that have propagated in ethereal mammals (e.g., 7SL-RNA-derived primate aluminum elements, tRNA-derived rodent identifier (ID) elements), this recent SINE-like expansion is probably due to random events. However, given the RNA modification activity of snoRNAs and our increasing awareness of the cellular importance of RNA molecules, it may be that some of the retrotranspositionally duplicated RNAs have been transformed into new functions of these species.

Other small RNAs

Overall, we found similarities with small RNA (sRNA) paths of other mammals, but also features that only apply to Monotreme. Components of the RNA interference machinery are conserved in platypus, including elements of biogenic pathways (Dicer and Drosha) and RNA interference effector complexes (argonaute proteins; Supplementary Table 4). Of the 20,924,799 Platypus and Echidna sRNA reads from the liver, kidneys, brain, lungs, heart and testes, 67% could be assigned to known microRNA (miRNA) families. Established patterns of miRNA expression were generally recapitulated in monotremes.

To determine the conservation patterns of miRNAs in platypus, we identified miRNAs in platypus that share at least 16 nucleotides with miRNAs in euthanasia (mouse / human) and chickens. Although most of the conserved miRNAs were identified via these vertebrate lines (137 miRNAs), 10 miRNAs were only shared with Eutherians (mouse / human) and 4 only with chicken (Figure 2a). miRNAs can be divided into families based on the identity of the functional "seed" region at positions 2-8 of the mature miRNA strand. We identified miRNA families that were shared between platypus and eutherians but not chickens (40 families) or between platypus and chickens but not eutherians (8 families), suggesting that for some miRNAs, only the seminal region may have been selectively conserved (2a)). Conserved miRNAs tended to show a higher expression than line-restricted miRNAs in the platypus tissues examined (Fig. 2b).

a ) Platypus has miRNAs that are shared with Eutherians and chickens, and a set that is platypus specific. MiRNAs cloned from six platypus tissues were assigned to families on the basis of semen conservation. Platypus miRNAs and families were divided into classes (indicated) based on their conservation patterns in eutheran mammals (mouse / human) and chickens. b, Expression of Platypus miRNAs. The cloning frequency of each platypus-mature miRNA sequenced more than once is represented by a vertical bar and clustered according to the conservation pattern. miRNAs from a number of monotremer-specific miRNA clusters expressed in testes are shaded in red.

Full size image

To identify miRNAs specific for Monotreme, we used a heuristic search for miRNA candidates in deep sequencing data sets 25 identified. This method predicted 183 new miRNAs in platypus and echidna (Fig. 2a). Remarkably, 92 of these were in 9 large clusters on Platypus chromosome X1 and contigs 1754, 7160, 7359, 8388, 11344, 22847, 198872, and 191065. Physical mapping confirmed that at least five of these contigs are linked to the long arm of Chromosome X1 (ref. 25). These abundantly expressed clusters were sequenced almost exclusively from Platypus and Echidna testis (2b). The expansion of this unique miRNA class and its expression domain suggests possible roles in monotremous reproductive biology 25 .

Piwi-interacting RNAs (piRNAs) associate with a germline-expressed group of argonaut proteins known as Piwis 26 is known, and play a role in transposon silencing and genome methylation 26 . Monotreme piRNAs have a strong structural similarity to those of Eutherians. They are about 29 nucleotides in length and come from large testicular-specific genome clusters with pronounced asymmetry of the genome strands, often with a typical "bidirectional" organization. We identified 50 large platypus piRNA clusters as well as numerous smaller clusters 25 . In contrast to piRNAs in mice, Platypus piRNAs are repetitive and have strong signatures for active transposon defense.

Gene evolution

We set out to define the protein-coding gene content of platypus in order to make both the specific biology of the Monotrem clade and comparisons with Eutherians and marsupials or the representative sauropsid chicken. Protein coding genes were established using the Ensembl pipeline 27 predicted, which was appropriately modified for platypus (Supplementary Notes 14), with an emphasis on similarity matches with mammalian genes. This resulted in a total of 18,527 protein-coding genes, which were predicted from the current platypus arrangement. The number of platypus protein-coding genes is therefore comparable to estimates (18,600–20,800) for humans and opossums 28, 29 .

We were first interested in identifying platypus genes that contribute most to the core biological functions that are preserved in mammals. These are typically "simple" 1: 1 orthologues, genes that have remained as single copies without duplication or deletion in platypus, in eutheria (especially in dogs, humans and mice) and in opossum, a representative marsupial. We then considered genes that were duplicated or deleted in the Monotrem line, or that were lost in the eutherian and / or marsupial lines. It is suggested that such genes contribute most to the lineage-specific biological functions that distinguish individual mammals 30 . These studies required the use of a foreign species, here chicken, a representative of the sauropsids.

As expected, the majority of the Platypus genes (82%; 15,312 of 18,596) have orthologues in these five other amniotes (Supplementary Table 5). The remaining "orphan" genes are expected to primarily reflect rapidly evolving genes for which no other homologues are discernible, mispredicting, and true line-specific genes that have been lost in each of the other five species studied. Simple 1: 1 orthologues that were conserved across the five mammalian species without duplication, deletion, or nonfunctionalization were highly enriched in household functions such as metabolism, DNA replication, and mRNA splicing (Supplementary Table 6).

We then identified evolutionary lineages that experienced the strictest purifying selection. The mouse terminal lines showed a significantly higher degree of purification selection (the ratio of amino acid changes to silent substitution rates, i.e. N / d S. = 0, 105, P <0, 001) as the terminal branches of dog, opossum and chicken (values ​​from 0, 123-0, 128)); human and platypus-terminal lines showed a significantly reduced cleaning selection (both 0.132, P <0.03). These values ​​likely reflect the increased efficiency in purifying the selection in populations of greater effective size such as that of mouse 31 contrary. We note that at least one nucleotide substitution has occurred on average at synonymous places of platypus and human orthologues since their last common ancestor (Supplementary Notes 17 and Supplementary Fig. 1).This means that most neutral sequences cannot be precisely aligned between the Monotrem and Eutherian genomes.

Next, we determined the genetic distance of Echidna (Tachyglossus aculeatus) from the platypus. The median d S. -Value of 0.125 for the orthologues of Echidna and Platypus compared to the value for the Monotrem line predicts that Platypus and Echidna last had a common ancestor 21.2 million years ago. Although similar to previous estimates 32, this value appears to contradict fossil evidence, possibly due to recent reductions in mutation rates in the Monotrem line 33 is due .

Monotreme biology

We next examined whether the reptilian characteristics of the ancestors of monotremes were reflected in the genes retained in platypus, sauropsids, and other vertebrates from outside the amniote group (such as frogs and fish) but lost lineages in Eutherians and marsupials (Fig. 1). These ancient, sauropsid-like characteristics of the platypus include the oviparity (egg-laying) and the external appearance of the sperm and retina. At the same time, we looked for genetic clues within the Platypus genome, both for traits that are specific to Monotreme, such as venom production and electrical reception, and for traits that are unique to mammals, particularly lactation. By examining platypus homologs of genes already known to be involved in certain physiological processes (see Methods), we highlight those platypus genes for which evolution illustrates the ancestral or inferred physiological features of monotrems.

Chemoreception

The semi-aquatic platypus was expected to perceive its terrestrial but non-aquatic environment by detecting odorous substances in the air using olfactory receptors and vomeronasal receptors (types 1 and 2: V1Rs, V2Rs). Nonetheless, a large number of olfactory receptor, V1R, and V2R homologues (approximately 700, 950, and 80, respectively) can be seen in the Platypus genome assembly, although for each family only a minority have frame disorders (approximately 333, 270 and 15, respectively) 34 . Many of these Platypus genes and pseudogenes are monophyletic, since they have been created by duplication in the 166 Myr since the last common ancestor of monotrems and therians. Although mouse and rat genomes have a greater number of olfactory receptors and V2Rs than the Platypus genome 35, 36, the Platypus repertoire of V1Rs with undisturbed reading frames is the largest that has been seen to date, 50% more than in mice (3b). This is especially noteworthy given the anole carolinensis lizard (sequence data courtesy of the Broad Institute) and the chicken 19 seem to have no such receptors. The large extension of the Platypus V1R gene family could reflect sensory adaptations for pheromonal communication or, more generally, for the detection of water-soluble, non-volatile odorous substances during underwater foraging.

a, b, The Platypus genome contains few olfactory receptor genes from olfactory receptor families that are greatly expanded among animals (three other mammals and one reptile are shown), but many genes in olfactory receptor family 14 ( a ) and a relatively large number of vomeronasal gene 1 (V1R) receptors ( b ). These schematic phylogenetic trees show relative family sizes and pseudogene contents of different gene families (counted next to the inner branches) and the V1R repertoire in platypus. Pie charts illustrate the ratio of intact genes (heavily shaded) to disturbed pseudogenes (lightly shaded).

Full size image

The repertoire of Platypus olfactory receptor genes is roughly half that of other mammals 37 . Even so, Platypus olfactory receptors fall into class, family, and subfamily structures that are well represented in mammals, with a few notable exceptions such as family 14 (Fig. 3a). Coupled with the finding that the lizard contains only 200 olfactory receptor genes and pseudogenes, this suggests that the Platypus olfactory repertoire is expected to be more related to other mammals than to sauropsids.

Eggs

The fertilization in platypus shows both sauropside and ethereal properties. Platypus egg cells are small (4 mm in diameter) compared to reptiles and birds of comparable size, and eggs hatch at an early stage of development so most of the growth of the embryo and infant, like marsupials, is lactation-dependent. As with all mammals and many other amniotes, the egg cell is created with a zona pellucida during fertilization. The Platypus genome codes for each of the four proteins of the human zona pellucida 38 as well as for two ZPAX genes (Table 1), which so far have only been observed in birds, amphibians and fish. The aspartyl protease nothepsin is present in platypus but has been lost from the marsupial and eutherin genes (Table 1). In zebrafish, this gene is specifically expressed in the liver of women under the action of estrogens and accumulates in the ovary 39 . These are the same properties as the vitellogenins, indicating that nothepsin may be involved in the processing of vitellogenin or other egg yolk proteins. We find that the platypus retained a single vitellogenin gene and a single pseudogene, while sauropsids like chickens have three and marsupials and eutherians have none.

Full size table

Spermatozoa

Orthologues of many of the fertilization-related eutheran sperm membrane proteins are found in the platypus (and marsupial) genome. These include the genes for a number of putative zona pellucida receptors and proteins involved in the sperm-oolemma fusion. Testicular-specific proteases, which are involved in the breakdown of the zona pellucida during fertilization in Eutherians, are all absent from the platypus genome arrangement.

Monotreme spermatozoa undergo some post-testicular maturation changes, including the acquisition of progressive motility, loss of cytoplasmic droplets, and the aggregation of individual spermatozoa into bundles during passage through the epididymis 11 . However, changes in maturation in the sperm surface that are both unique and essential for other mammals to fertilize the egg have yet to be identified. In addition, the epididymis of monotremes is not as well suited for sperm storage as it is in most marsupials and eutherian mammals. Consistent with these findings is the lack of platypus genes for the epididymal specific proteins involved in sperm maturation and storage in other mammals. The most frequently excreted protein in the epididymis of the platypus is a lipocalin, the homologues of which are the most frequently excreted proteins in the reptile's epididymis 41 . Notably, ADAM7, a protease secreted in the epididymis of Eutherians, has an ortholog in the platypus. This is a true protease with a characteristic Zn 2+ -Coordination sequence HExxH in platypus, opossum and shrew (Tupaia belangeri). However, the loss of its proteolytic activity is seen in Eutherians 42 predicted due to a one-point mutation within its active site (E to Q).

Breastfeeding and dentition

Lactation is an ancient reproductive trait whose origin predates that of mammals. It has been suggested that early lactation can be used as a water source to protect porous parchment-peeled eggs from drying out during incubation 43 or developed as protection against microbial infections. Parchment-peeled egg-laying monotremes also have an anterior mammary gland spot or areola without a nipple, which may still play a role in egg protection. As with all mammals, milk has evolved from monotremes beyond primitive egg protection to real milk that contains a rich secretion that contains sugars, lipids and milk proteins with nutritional, antimicrobial and bioactive functions. Based on this similarity, platypus casein genes are closely linked in the genome as in other mammals, although platypus contains a recently duplicated β-casein gene (Supplementary Fig. 2).

It is believed that mammalian casein genes originated from duplication of enamelin or ameloblastin 44 have arisen. Both genes are enamel matrix protein genes that are next to the casein gene cluster in Eutherians, and we find them in Platypus as well. Both adult platypus and echidnas lack teeth, but the preservation of these enamel protein genes is consistent with the presence of teeth and enamel in both juvenile and fossil platypus 45 .

Poison

Only a handful of mammals are poisonous, but the male platypus is unique in delivering its venom through hind leg spores rather than a bite. Despite the apparent difficulty in taking samples, it is now known that platypus venom is a cocktail of at least 19 different substances, 46 including defensin-like peptides (vDLPs), natriuretic C-type peptide (vCNP) and nerve growth factor (vNGF). Phylogenetic analysis and assignment to the Platypus genome assembly revealed that these sequences originate from local duplications of genes with very different functions (Fig. 4). Notably, duplications in each of the gene families of β-defensin, C-type natriuretic peptides and nerve growth factors also occurred independently in reptiles during the development of their venom 47 . A convergent evolution is thus clear during the independent evolution of reptile and monotrem poison 48 occurred.

The diagram shows separate gene duplications in different parts of the phylogeny for platypus venom defensin-like peptides (vDLPs), for lizard venom crotamine-like peptides (vCLPs) and for snake venom crotamines. These venom proteins therefore became independent from preexisting non-toxin homologs in platypus as well as in lizards and snakes 48 co-opted.

Full size image

immunity

Although the main organs of the monotremal immune system are similar to those of other mammals 49, the repertoire of immune molecules shows some important differences from those of other mammals. In particular, the Platypus genome contains at least 214 natural killer receptor genes (Supplementary Notes 18) within the natural killer complex, a far greater number than for humans (15 genes 50 ), Rats (45 genes 50 ) or possum (9 genes 51) ).

Both the platypus and the opossum genome contain gene extensions in the cathelicidin family of antimicrobial peptides (supplementary Fig. 3). In the Eutherians, primates and rodents have a single cathelicidin gene 52, 53whereas sheep and cows have numerous genes that have only recently been duplicated 54 . The expanded repertoire of cathelicidin genes in both marsupials and monotrems could equip their immunologically naive cubs with a diverse arsenal of innate immune responses. In the case of the Eutherians, the diversity of the antimicrobial peptide genes may be less critical as the gestation period increases and as the uterus develops in their immune system. The Platypus genome also contains an extension of the macrophage differentiation antigen CD163 gene family (Supplementary Notes 18).

Genome landscape

First, we analyze the phylogenetic position of platypus and confirm that marsupials and Eutherians are more closely related to monotremes than either (Supplementary Notes 19). We then describe platypus chromosomes and observe some characteristics of platypus-interspersed and tandem repeats. We also discuss a possible relationship between interspersed repeats and genomic footprint and examine how the extremely high G + C fraction in platypus affects the strong association between CpG islets and gene promoters in Eutherians.

Platypus chromosomes

Platypus chromosomes provide clues about the relationship between mammalian and reptilian chromosomes, as well as the origins of mammalian sex chromosomes and dose compensation. Our analysis provides further information on the following results: The 52 platypus chromosomes show no correlation between the position of orthologous genes on the small platypus chromosomes and the chicken microchromosomes. for the platypus unique 5X chromosomes we show considerable sequence alignment similarity to chicken Z and no orthologous gene alignments to human X, implying that the platypus X chromosome evolved directly from a bird-like stem-reptilian system 55 ; and the genes on the five Platypus X chromosomes appear to be partially dose compensated (supplementary 5), possibly paralleling the incomplete dose compensation recently seen in birds 56 was described .

Repeat elements

About half of the Platypus genome is made up of distributed repeats derived from transposable elements. The most common and still active repetitions are (greatly shortened) copies of the 5 kb long scattering element (LINE2) and its non-autonomous sine-accompanying, mammalian-wide interspersed repetition (MIR, Mon-1 in monotrems) Extinct in marsupials and in Eutherians before 60-100 myr. We estimate that the 2.3 Gb Platypus genome contains 1, 9 and 2.75 million copies of LINE2 and MIR / Mon-1, respectively. DNA transposons and LTR retroelements are quite rare in platypus, but there are thousands of copies of an ancient Gypsy class LTR element (all LTR elements previously identified in mammals, birds, or reptiles belong to the retrovirus group). Overall, the frequency of interspersed repetitions (over 2 repetitions per kb) is higher than in any previously characterized metazoan genome. The population analysis using LINE2 / Mon-1 elements distinguished the Tasmanian population from three other mainland clusters (supplementary Fig. 4a, b) in good agreement with the tree-based analysis, physical proximity and previous knowledge of the relationships between platypus populations 57 .

A cluster analysis of all LINE2 copies revealed a phylogenetic relationship with no branches, as if a rapidly developing gene with a single location had steadily spread an extraordinary number of pseudogenes over time (supplementary Fig. 6). This "master gene" occurrence is also to a lesser extent with LINE1 in Eutherians 58 to watchbut not to the same extent for MIR / Mon-1 or other mammalian retrotransposons. The phylogenesis of LINE2 and Mon-1 was also verified by a genome-wide transposition-in-transposition (TinT) analysis 59 supported (supplementary tables 7 and 8). The density of LINE2 is similar on all chromosomes (supplementary Fig. 7); it does not correlate with chromosome length (and the rate of recombination) like the CR1-LINE density in the chicken genome 19, and it is no higher on sex chromosomes than it is on autosomes, since the LINE1 density in Eutherians is (which has led to postulations of a function in dose compensation) 60 .

We compared microsatellites in the Platypus genome with those of representative vertebrates (Supplementary Notes 22). The mean microsatellite coverage of the platypus genome sequences assembled in chromosomes is 2.67 ± 0.34%; significantly lower than any other mammalian genomes sequenced to date and most similar to those observed in chickens (Supplementary 8). Microsatellites are on average shorter in platypus than in other genomes (Supplementary Table 9), but the microsatellite coverage exceeds chickens due to very long tri- and tetranucleotide repeats (Supplementary Fig. 9). Compared to the other vertebrates examined, the platypus has a higher proportion of microsatellites with a high A + T content, a frequency distribution that is more in common with reptiles than with mammals (supplement Fig. 10).

Genomic imprinting

The imprinting of genomes is an epigenetic phenomenon that leads to monoallelic gene expression. In vertebrates, the imprint appears to have evolved recently and has only been confirmed in marsupials and Eutherian mammals 61, 62 . The autosomal localization of some imprinted orthologues in the platypus is known 63 . However, we examined the conservation of synteny and the distribution of retrotransposed elements in all orthologous Eutherian-imprinted, clustered, and non-clustered genes in the Platypus genome. A representative cluster is shown in Fig. 5 (see also supplementary Fig. 12).

a ) The gene arrangement is conserved between mammals. However, non-coding regions are expanded into therians. Arrows indicate genes and the direction of transcription; The scale shows base pairs. b, Summary of the repetition distribution for the PEG1 / MEST cluster. Histograms represent the sequence (%) masked by each repeat element within the MEST cluster. Black bars represent the repetition distribution over the entire genome.With the exception of SINEs, platypus has fewer repeats of LINEs, LTRs, DNA and simple repeats than eutherian mammals. Low Comp., Low Complexity; sRNAs, small RNAs.

Full size image

Clusters imprinted in Therians (with the exception of the Prader-Willi-Angelman locus 64 ), have not recently been assembled and are in ancient syntenic mammalian groups, although some regions have expanded through mechanisms such as gene duplication or transposition. There were significantly fewer LTR and DNA elements in all orthologous Platypus regions compared to genes with Eutherian imprint (P <0.04 and 0.04, respectively), while the sequences masked by SINEs increased significantly (P <0, 03). The chicken had fewer replicates and no SINEs or sRNAs. The comparison of all regions in platypus with the orthologous regions in opossum, mouse, dog and human shows that the accumulation of LTR, DNA elements and simple and low complexity repetitions coincides with the acquisition of imprints in and can be a driving force in these regions therian mammals.

The CpG Group

The genome of Eutherians and chickens generally has an average G + C content of around 41%, although many intervals, particularly in humans, differ significantly from the average (Supplementary Notes 23). In contrast, the Platypus genome averages 45.5% G + C and rarely deviates far from the average. The opossum genome has an average G + C content of only 38% and a narrow distribution (supplementary Fig. 13). The source of the increased G + C fraction in platypus remains unclear. It is only partially explained by monotremely interspersed repeat elements, since the Platypus DNA outside the known interspersed repeats is 44.7% G + C. In addition, tandem repeats of short DNA motifs (microsatellites) in platypus, as in other mammals, exhibit an A + T distortion. The biased gene conversion caused by recombination may be a factor that is consistent with what goes on for Eutherians 65 and marsupials 66 was shown . This is suggested by the observation that the six Platypus chromosomes for which the currently mapped DNA sequence has an average G + C content of over 45% (i.e. 17, 20, 15, 14, 10, and 11 in that order decreasing G + C fraction) among the 10 shortest (supplementary Fig. 14), because short chromosomes have a higher rate of recombination 67 . However, a direct test is currently lacking because platypus recombination rates have not been measured. A further examination of the CpG fraction associated with promoter elements can be found in Supplementary Notes 24 and Supplementary Fig. 15.

Conclusions

The egg-laying platypus is a remarkable species with many biological characteristics unique to mammals. Our sequencing of the Platypus genome now enables us to compare its sequence characteristics and organization with those of birds and ethereal mammals to answer questions of Platypus biology and to date the emergence of mammalian traits. We report here that the sequence characteristics of the Platypus genome show features of reptiles as well as mammals.

Platypus contains a largely standardized repertoire of non-protein-coding ncRNAs, with the exception of the snoRNAs, which have a significant expansion that is associated with at least one retrotransposed subfamily. Some of these retrotransposed snoRNAs are expressed and therefore can have functional roles. The platypus has fully elaborated piRNA and miRNA pathways, the latter comprising many monoteme-specific miRNAs and miRNAs shared with either mammals or chickens. Many functional assessments of these new miRNAs have yet to be performed and will certainly add to our knowledge of miRNA evolution in mammals.

The 18,527 protein-coding genes predicted from the Platypus arrangement fall within the range for the Therian genomes. Of particular interest are families of genes involved in biology that link monotreme to reptiles such as oviposition, eyesight, and poisoning, as well as mammalian-specific traits such as lactation, traits shared with marsupials such as antibacterial proteins, and platypus-specific traits such as venom delivery and characters Underwater foraging. For example, anatomical adaptations for chemoreception during foraging under water are reflected in an unusually large repertoire of vomeronasal type 1 receptor genes. However, the repertoire of milk protein genes is typical of mammals, and the arrangement of milk protein genes appears to have persisted since the last common ancestor of monotremes and ethereal mammals.

Since its first description, the platypus has excelled as a species with a mixture of reptilian and mammalian traits that are characteristic of the level of genomic sequence. This is reflected, for example, in the density and distribution of the repetitive sequences. The high frequency of repeats in the Platypus genome, which is typical for mammalian genomes, is in contrast to the observed mean microsatellite coverage, which appears rather reptilian. Furthermore, the correlation of parent-origin-specific expression patterns in regions with reduced interspersed repeats in platypus suggests that the development of the print in these regions is related to the accumulation of repeating elements.

We find that the mix of reptilian, mammalian, and unique characteristics of the platypus genome provides many clues about the function and evolution of all mammalian genomes. The abundance of new evidence and confirmation of existing knowledge that will emerge immediately from the publication of these data promise that the availability of the Platypus genomic sequence provides the much-needed backdrop for rapid advances in other studies of mammalian biology and evolution.

Method overview

Tissue resources

Tissue was obtained from animals caught during the breeding season on the Upper Barnard River, New South Wales, Australia (AEEC permit number R.CG.07.03 to F. Grützner; Environment ACT permit number LI 2002 270 to JAM Graves; NPWS- Approval number A193) to RC Jones, AEC approval No. S-49-2006 to F. Grützner).

Sequence structure

With the PCAP software 20 became A total of 26.9 million reads were made. Attempts have been made to match the largest contiguous blocks of sequence to chromosomes using standard FISH techniques.

Non-coding RNAs

We used the established Rfam pipeline 68 and de novo sequencing to detect non-protein coding RNAs (ncRNAs). Cloning, sequencing and annotation of sRNAs from Platypus, Echidna and Chicken as well as miRNA sequences are in Ref. 25.

Genes

Protein coding and non-protein coding genes were calculated using a modified version of the Ensembl pipeline (Supplementary Notes 14). The genetic orthology was assigned according to a previously implemented procedure 69 . The rate of orthology was estimated using PAML 70 under Using the model carried out by ref. 71. In all cases, the codon frequencies were estimated from the nucleotide composition at each codon position (F3X4 model).

Genome landscape

Pairwise alignments between humans and dogs, mice, opossums, platypus and chickens were projected from alignments of the entire genome of 28 species (//genome.cse.ucsc.edu/). These alignments were the basis for phylogeny, chromosome synteny, interspersed repeats, imprinting and CpG fraction analyzes.

Online methods

Sequence structure

With the PCAP software 20 became A total of 26.9 million reads were made. The assessment of the assembly quality took into account the reading depth, the chimeric readings, the repetition content, the cloning distortion, the G + C content and the heterozygosity (Supplementary Notes 4–11). Using two independent analyzes, SSAHA2 (SSAHA: a quick search method for large DNA databases 72 ) and PCAP output 20 (Supplementary Notes 11), we identified a total of 1.2 million single nucleotide polymorphisms (SNPs) within the 1.84 Gb sequenced female platypus genome).

Non-coding RNAs

The annotation of the snoRNA takes place as in Ref. 23. miRNAs with a heptamer at nucleotide positions 2–8 were defined as a family. The homology with mouse / human miRNAs was based on annotated miRNAs in Rfam (//microrna.sanger.ac.uk/sequences/index.shtml). piRNA sequences have been submitted to GEO (http://www.ncbi.nlm.nih.gov/geo/). The total cloning frequency of the miRNA was normalized across the tissue libraries by scaling the cloning frequency per library by a factor that represents the total number of miRNA reads per library.

Genes

Orthologous groups were selected based on whether they contained genes predicted only from platypus and not from the chicken, possum, canine, mouse, or human genome assemblies (Supplementary Notes 15-17). Other groups were selected in which the number of inparalogous platypus genes exceeded the number of the other terminal lines (chicken, opossum, dog, mouse, and human). Some of these groups represent faulty gene predictions, with protein coding sequence predictions, for example, instead representing transposed elements or highly repetitive sequences, or overlapping other well-established coding sequences on the reverse strand. Such cases were discarded. Line-specific gene loss was determined by inspecting the BLASTZ alignment chains and nets in the UCSC genome browser (//genome.cse.ucsc.edu/). by querying all known cDNA, EST and protein sequences held in GenBank using BLAST; and by attempting to predict orthologous genes within genomic intervals flanked by syntenic anchors.

Genome landscape

To establish phylogeny, we expanded the data collection approach described above 73 on protein-coding genes and used established techniques to make protein-coding indels 74 and retrotransposon insertions 75 to be analyzed (Supplementary Notes 19).

The population structure of 90 platypus from different regions in Australia was determined using the Structure software v2.1 (Ref. 76) using genotypes of 57 polymorphic Mon-1 and LINE2 loci. Five thousand replicates were examined (Supplementary Notes 21).

Microsatellites were identified via the Platypus genome (ornAna1), combining two programs: Tandem Repeat Finder (TRF) 77 and Sputnik 78 (Supplementary Notes 22).

For the PEG1 / MEST embossing cluster, comparison maps of Vega annotations for mice and humans and ensemble gene constructions for other species were created. Using MLAGAN 79 with translated anchoring, multiple alignments of each region were constructed for repeat distribution analyzes.

We examined genome assemblies for humans (hg18), mice (musMus8), dogs (canFam2), opossums (monDom4), platypus (ornAna1) and chickens (galGal3), which were extracted from the UCSC genome browser (//genome.ucsc .edu) have been downloaded. and calculated the fraction of G + C nucleotides in each 10,000 bp non-overlapping window that is free of ambiguous bases. Repeated bases were not distinguished and counted together with non-repeated bases. The entire composite sequence was analyzed for platypus; for the other species only bases assigned to the chromosomes were used.

Accessions

Primary joinings

GenBank / EMBL / DDBJ

  • AAPN00000000
  • AAPN01000000

Data storage

The Ornithorhynchus anatinus whole genome shotgun project has been deposited in the DDBJ / EMBL / GenBank under the project start AAPN00000000. The version described in this document is the first version, AAPN01000000. The SNPs were stored in the database dbSNP (//www.ncbi.nlm.nih.gov/projects/SNP/) with the submitter method IDs PLATYPUS-ASSEMBLY_SNPS_200801 and PLATYPUS-READS_SNPS_200801.

Further information

PDF files

  1. 1.

    Further information

    This file contains the supplementary notes S1-S24, the supplementary tables 1-11, the supplementary figures 1-14 with legends and additional references.

  2. 2.

    Supplementary tables

    This file contains additional tables, which are not referred to in the main manuscript but in the supplementary notes (nature06936-s1), and contains the additional tables A1-A17.

  3. 3.

    Supplementary illustrations

    This file contains supplementary figures, which are not referred to in the main manuscript but in the supplementary notes (nature06936-s1), and contains supplementary figures A1-A14.

  4. 4.

    Additional information

    The file contains supplementary data with sequences and access codes for the predictions of platypus genes discussed in the text.

Excel files

  1. 1.

    Additional information

    The file contains supplementary data with all sequences for all PCR primers that were used to create the Platypus population structure

Remarks

By submitting a comment, you agree to our terms of use and community guidelines. If you find something abusive or that does not comply with our terms or guidelines, please mark it as inappropriate.