Conservation and gene birth and death across vertebrate species
To comprehensively understand the gene birth and death events for the ABC transporter superfamily invertebrates, we interrogated 62 vertebrate ABC genes across 64 vertebrate species (12 primates, five rodents, 21 other mammals, three marsupials, five birds, 13 fish, two reptiles, and one amphibian). Each gene was examined in the gene tree of the human or representative species in the ENSEMBL database. We noted the appearance of a full-length or partial gene as well as potential missing or duplicated genes. We compared these species against species with formal analyses of the ABC superfamily (human, mouse, zebrafish, and lamprey) (Dean, Rzhetsky, et al., 2001). There are high coverage genomes for 13 species that are likely to provide an accurate gene count (human, chimp, macaque, mouse, rat, dog, opossum, chicken, Xenopus, zebrafish, and fugu). This result provides at least one index species for most of the major orders of primates, rodents, carnivores, marsupials, birds, amphibians, and fish. However, as many of the remaining species have low-coverage draft genome assemblies, many missing genes are not likely to be gene loss events (Milinkovitch, Helaers, Depiereux, Tzika, & Gabaldon, 2010).
The number of ABC genes in primates is very stable. The ABCA10gene is missing from the orangutan, gibbon, and marmoset genomes;ABCA10 is part of a cluster of five ABCA5 -related genes that are duplicated head-to-tail on human chromosome 17. The gene loss event converting ABCC13 into a pseudogene (Annilo & Dean, 2004) appears to be confined to the great apes, as ABCC13 is intact in all other primates. The bushbaby (Otolemur garnettii ) genome seems to have an additional TAP2 /ABCB3 gene. The predicted amino acid sequences show that the two bushbaby TAP2 genes are in the same sequence contig. Their amino acid sequences have diverged, consistent with gene duplication. TAP1 and TAP2 play essential roles in antigen presentation, and duplication of TAP2 also occurs in many fish genomes. This result is of potential interest for the study of the evolution of immunogenetics of primates. In total, all primates contain between 48 and 50 ABC genes.
Rodents have many gene gain and loss events affecting the A, B, and G subfamilies. The ABCA5 -like cluster contains from three to five genes, and a cluster of Abca14 , Abca15 , Abca16 , andAbca17 genes (Ban, Sasaki, Sakai, Ueda, & Inagaki, 2005; Z. Q. Chen, Annilo, Shulenin, & Dean, 2004) is present only in the mouse, rat, and squirrel genomes, not in the guinea pig or kangaroo rat. The well-described duplication of the Abcb1 gene in the mouse and rat genomes is also found in the guinea pig but not in other rodents. The loss of the ABCC11 gene from the mouse genome extends to all rodents, but ABCC11 is present in the Lagomorphs (rabbit, pika), indicating that this gene loss is specific to rodents. Abcg3 is a gene first discovered in the mouse genome closely related toABCG2, a well-described efflux transporter (Mickley et al., 2001). Abcg3 is only found in rodents, but the rat genome is predicted to have two Abcg3 genes, and the hamster 4-6 copies.
Further examination of additional rodent genomes shows an Abcg3gene present in the prairie vole and up to four copies in the deer mouse genome. The function of Abcg3 is unknown but proposed to be an efflux pump due to its close sequence homology with ABCG2. However, it is exclusively expressed in the spleen and thymus in the mouse, suggesting it has a role in the immune response (Mickley et al., 2001). In addition, the presence of multiple Abcg3 gene birth events in the rodent lineage suggests that it has an unknown vital function.
There are no other apparent ABC gene death or birth events within other mammalian genomes, and for those mammals with complete genome assemblies, there are 44-54 ABC genes annotated. However, it is difficult to accurately determine the gene counts in the ABCA5 and ABCA14 gene clusters. These clusters contain from 3 to 5 genes in most mammals and pseudogene fragments (Annilo, Chen, Shulenin, & Dean, 2003). Examination of the assemblies in these regions in species with apparently missing genes shows gaps in the assembly. More complete genomes, including long-range sequencing or assembly methods, are needed to resolve these areas. However, we did not search for species for new ABC genes, and there may be yet undiscovered gene birth events.
There has been no previous formal analysis of the ABC gene family for birds, amphibians, or marsupials. The opossum is the index marsupial species with a 7.3x genome coverage and contains 37 predicted full-length and ten partial ABC genes for 47 genes. The opossum appears to be missing ABCA15 , 16 and 17 , ABCB5 andABCB13 . These same genes were absent from the genomes of other marsupials, the Tasmanian devil, and the wallaby. The frog,Xenopus tropicalis , is an amphibian index species with 37 full and four partially predicted genes. There are two predictedXenopus ABCB5 genes on separate contigs. An alignment of these sequences shows considerable diversity in well-aligned regions, suggesting that this is an actual duplication. The anole lizard is the one reptile species with a high-density genome assembly (Alfoldi et al., 2011). There are 38 complete and four partial gene annotations for 42 ABC transporters. The lizard and other reptile genomes (snake, turtles, tortoises, tegu lizard, and tuatara) duplicate the ABCG2 gene.
The chicken is the index bird species and has multiple apparent ABC gene loss events, with the genome lacking ABCB12 and ABCB13 ,ABCD1 , and ABCF1 . As ABCD1 and ABCF1 are very conserved genes, this is unexpected. ABCD1 and ABCD2are closely related, and a single ABCD1/2 gene is found in invertebrates. However, fish have both ABCD1 and ABCD2orthologs, suggesting that birds lost the ABCD1 gene. In the human genome, the ABCD1 gene is on the X chromosome, and mutations in ABCD1 are responsible for the severe, often lethal, X-linked recessive disease, adrenoleukodystrophy. ABCD1 is expressed in the peroxisome and adrenoleukodystrophy is a demyelinating disorder, but the functional effect of ABCD1 defects in the disease is not clear.
There have been detailed analyses published of the ABC gene superfamily in zebrafish, carp, catfish, and lamprey (S. Liu, Li, & Liu, 2013; X. Liu et al., 2016; Ren et al., 2015). These studies all document multiple gene birth events in fish, such as duplications of ABCA1 ,ABCA4 , ABCB3 , ABCB6 , ABCB11 , ABCC5 ,ABCC6 , ABCG2 , and ABCG4 . Only the ABCC6genes have been studied in detail, with the Abcc6a gene shown to be essential and Abcc6b expressed in the developing kidney (Li et al., 2010). As fish underwent a whole-genome duplication, the number of genes that have been retained and now carry out new functions is complex. Some duplications are confined to specific species, such as a duplicated ABCF2 in zebrafish, catfish, and a few other species (S. Liu et al., 2013). Many other examples of lineage-specific duplications and losses in specific fish lineages have been described, and it will require highly accurate genome assemblies to understand the complexity (X. Liu et al., 2016). For example, there are fourABCG2 -related genes in the zebrafish, and other fish species have complex combinations of these genes, including additional duplications. As a representative of a more primitive fish species, the lamprey has few of the gene duplications seen in jawed fish and has only 34 predicted ABC genes.
In conclusion, the availability of many vertebrate genome assemblies allows a more detailed analysis of the evolution of ABC transporters. There have been dynamic changes in the gene number in each of the seven common subfamilies, with the most dramatic changes in the A, B, and G subfamilies. Because ABC proteins can carry out a wide variety of transport functions, it is likely that individual lineages of species, and even specific animals, would develop specific transporters for highly specialized functions, probably due to environmental pressure. It is also apparent within the phylogenetic trees of individual genes that considerable diversification has taken place. As even a single amino acid change can alter the substrate specificity of an ABC transporter, the true diversity of substrates is enormous. One of the most diversified sets of genes is the multi-specific transporters ABCB1/PGP and ABCG2. This finding is consistent with an essential role for these pumps in xenobiotic elimination and maintenance of tissue barriers in the brain, intestine, and placenta. ABCB1 has independently duplicated in several species such as certain rodents, the cow, and fish. Even more dramatic are the duplications of ABCG2 that have taken place in fish species. As fish live in highly diverse aquatic environments, they are exposed internally and externally to an aqueous environment. Therefore, it is not surprising that they need to excrete many environmental toxins and protect internal organs from xenobiotics. For some gene clusters, particularly the ABCA5 and ABCA14clusters, these genes are challenging to assemble, as the genes are large and closely related. Therefore, the complete annotation will require complete draft genome assemblies.
One of the most intriguing ABC gene subfamilies is the ABCH family. Initially identified in Drosophila and Dictyostelium , ABCH genes are half transporters, with an N-terminal NBD, the same structure as the ABCG genes. Invertebrates, the ABCH genes are only found in fish. There is a single ABCH1 in most fish species and the coelacanth, but the gene is missing from lamprey and other fish species (Jeong et al., 2015). A function in lipid transport has been described for an ABCH gene (LmABCH-9C) in the locust, Locusta migratoria (Yu et al., 2017). Still, to date, there is no functional information on this gene group in vertebrates.