The paper describes the version 4 of the DnaSP (DNA Sequence Polymorphism) software, a bioinformatics tool used to analyze and interpret DNA sequence polymorphism data (Rozas et al. 2003). The software implements a number of analytical methods, some of them developed by our own research group. Formally, the project began in 1994, but it has a history going back a few years ago, during my PhD studies (University of Barcelona, under the supervision of Professor M. Aguade) and, later during my Postdoc (Harvard University; at the laboratory of Professor R. C. Lewontin). At this time, I was performing some DNA variation analysis and, importantly, there was not any convenient software for the data analyses of such early studies. Thus, I started the development of DnaSP mainly because at that time there was no software to conduct such analyses.
The analysis of DNA sequence polymorphisms and SNPs (single nucleotide polymorphisms) provides valuable insights into our understanding of the evolutionary meaning of DNA polymorphisms. For instance, both demographic (such as population expansions) or selective events, left distinctive molecular hallmarks on the patterns and levels of within-species DNA sequence data. Therefore, the analyses and interpretation of DNA polymorphism data can inform about, for example, past population growth or decline events, gene flow among populations, or the impact of natural selection (Begun et al. 2007, Nielsen 2005, Rosenberg and Nordborg 2002).
Determining the impact of natural selection in molecular evolution and in adaptation is a fundamental question in evolutionary biology and evolutionary genomics. In this context, the detection of both positive (adaptive or Darwinian) and negative (purifying) selection is of great interest. Besides the deep conceptual implications, it might provide insights into our understanding of the biological function of genes and genomic regions. Indeed, positive selection promotes advantageous DNA sequence changes, which are ultimately responsible for evolutionary adaptations and novelties. The characterization of genomic regions shaped by this kind of selection has profound implications not only in evolutionary biology but also in our understanding of gene function. Negative selection, on the other hand, purges deleterious mutations favoring, therefore, the conservation of DNA sequences. Hence, these evolutionary conserved regions are, likely, to be functionally important and might uncover novel unknown biological functions. Moreover, DNA polymorphism data are very valuable powerful tools (as molecular markers) in a wide range of disciplines such as biomedicine, animal and plant breeding, conservation genetics, epidemiology genetics, or forensics.
DnaSP is industry-standard software to compute population genetics analyses including coalescent-based methods, the state-of-the-art framework to analyze DNA sequence polymorphism data (Rozas et al. 2003, Rosenberg and Nordborg 2002, Librado and Rozas 2009, Wakeley 2009). DnaSP is a multi-purpose software package that allows conducting exhaustive analysis using a very friendly graphical user interface (GUI). The main features of the software are as follows: (i) accommodates large data sets; (ii) computes many population genetic statistics describing the level and patterns of DNA polymorphism within and between populations; (iii) conducts computer simulation coalescent-based tests; (iv) generates graphical outputs rendering the information readily understandable sequence polymorphism and SNP data.
Although DnaSP version 4 was published in 2003, the first article (describing the first version of the software) was sent to publish to the scientific journal CABIOS (Computer Applications in Biosciences; currently named as Bioinformatics) in 1995, and it was quickly accepted (Rozas and Rozas 1995). Indeed, the other publications were also quickly accepted and published, including the most recent one (Librado and Rozas 2009). We think that this was because a combination of factors. First, it describes software for DNA polymorphism data analysis, and when the first version was released, there was no software of its kind. Second, it allows the computation of powerful and comprehensive state-of-the-art statistics and methods. Third, it provides a very easy to use GUI and with a learning curve that was extremely short. Fourth, the software can read a number of format data files, so it was very easy to input the data to the software to proceed with the analyses. Finally, the software provides an exhaustive help both to explain how to perform the analyses but also their scientific basis.
Begun, D.J., A.K. Holloway, K. Stevens, L.W Hillier, Y. Poh, M.W. Hahn, P.M. Nista, C.D Jones, A.D. Kern, C.N. Dewey, L. Pachter, E. Myers, and C.H. Langley. 2007. "Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans." PLoS Biol no. 5 (11):e310. doi: 10.1371/journal.pbio.0050310.
Rozas, J., and R. Rozas. 1995. "DnaSP, DNA sequence polymorphism: an interactive program for estimating population genetics parameters from DNA sequence data." Computer applications in the biosciences : CABIOS no. 11 (6):621-625. doi: 10.1093/bioinformatics/11.6.621.
Rozas, J., J.C. Sanchez-DelBarrio, X. Messeguer, and R. Rozas. 2003. "DnaSP, DNA polymorphism analyses by the coalescent and other methods." Bioinformatics no. 19 (18):2496-2497. doi: 10.1093/bioinformatics/btg359.
This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.