Science AMA Series: We are Yaniv Erlich and Joe Pickrell, researchers building tools like DNA.Land to help you unlock your genome. Ask us anything!


Hello Reddit! We are:

Yaniv Erlich: Professor of computer science at Columbia University and the New York Genome Center

Joe Pickrell: Professor of biology at Columbia University and the New York Genome Center

We develop new, fun ways to analyze genetic data, which we use to understand aspects of genetic privacy, infer ancient history, improve medicine, and much more.

More importantly, we want these tools to be available to everyone, including you. If you have your personal genome data, you can sign up at DNA.Land to contribute to research and learn more about your DNA. If you don't have your data, you can sign up to get sequenced by our project Seeq.

We’re here from at least 1pm-3pm EDT (10 am PST, 6 pm UTC) to answer questions about our studies and about genomics in general! Ask us anything!

EDIT: Thanks so much for all the questions, this was a lot of fun! If you're in NY, come chat more at our free DNA.Land user group meeting next month!

What is the most exciting thing you discovered in your own genetics? How about disturbing...?


Yaniv here: I discovered two cool things. First, by oral tradition, I know that my paternal line is of Jewish priests (Cohen). By analyzing my Y chromosome, I found that my Y chromosome belong to the Cohen Modal Haplotype that is shared by most Jewish Cohens. This pretty neat to see that.

The second thing is that I discovered that my maternal line is east african! My mother side is from Uzbekistan and they are all light skinned. However, my mitochondria is very common in the horn of Africa. Doing some research, I found that my maternal line was a subject to a slave trade from east of Africa that ended up in Pakistan... Pretty cool!

What is the most exciting thing you discovered in your own genetics? How about disturbing...?


Joe here.

The main two interesting things I saw in my genome were:

  1. It turns out my great-grandfather was Jewish, and he had hidden this fact to avoid discrimination when he moved to the US and married a Catholic woman. This aspect of my ancestry was unknown to me, and I only realized it after doing a genetic test.

  2. I'm a carrier of the APOE4 allele, which gives me double the average risk of Alzheimer's disease.

Good day professors and thank you for the AMA. My questions are about your thoughts on CRISPR.

With the introduction of CRISPR/Cas9 it seems that the ability to do just about anything with our DNA is over the horizon. When you think of the benefits to humanity that something like CRISPR could bring what are you most hopeful for? When you think of the possible negative applications of CRISPR what is your greatest fear? Considering both the positives and negatives, how do you think the introduction of CRISPR will continue to play out?

Thank you for your time.

I would like to add that my oldest daughter is severely autistic and unable to care for herself. The thought that one day before I die my daughter might receive a treatment to correct her severe autism to the point where she's at least able to care for herself and communicate is the only hope that exists for my wife and I.


Yaniv here. CRISPR is definitely a game changer in genetics and in many field of biomedical research. I think that we have a long way to go from being able to edit or correct genetic information in living human beings. For example, we still do not know most of the genetic changes that cause most of the common conditions, such as autism. My view is the first applications outside the lab of CRISPR will be in the area of controlling populations, such as:

The cost for a user to map his genome via Seeq appears to be $50, which seems to be a bit less than the competition. How is Seeq able to offer such a low price relative to the competition, and what does the competition offer that justifies the higher price point for a similar service?


Joe here.

We have some fun new technology that lets us get the price this low, but it definitely limits us a bit in terms of what we can do (mostly to the features listed on our site, though other things are in the works).

23andMe, for example, can tell you if you're a carrier for specific genetic diseases, which we can't do (not just for legal reasons, but for technical reasons as well). If you're interested in those clinical aspects of genomics, that could definitely justify a higher price point.

Hey Yaniv and Joe! Thanks so much for being here! There has been a lot of hype around direct to consumer genetics. Some would argue we haven't really left the 'wild west of genomics' since 23andMe got its hand slapped by the FDA for giving out inappropriate clinically-relevant information and SportsXFactor claimed to tell you about your slow/fast twitch fibers. Today, companies like Veritas advertise, "whole genome sequencing to improve your health and longevity."

Others argue that by keeping the information and analytical tools that scientists use out of the hands of the general population, we have overly neutered the power of individuals to leverage their own genetic data. Sites like Promethease put medically-relevant genetic data interpretation in the hands of the consumer- only for "research and educational purposes." However because the genome (and it's interaction with the environment) is extraordinarily complex, population level risks (ie, "2x risk of Alzheimer's disease" from Promethease) does not translate to individual risk.

My question is, what do you think consumers should be doing with their genomes? Where would you draw the line between genomic recklessness (leading to misinterpretation and possible risk to consumer privacy/health) and genomic usefulness (getting the most out of potentially important ancestral or health related risk information), given what we know today?


Yaniv+Joe: This is an excellent question. We believe that people own their genomic information. Therefore, they should be allowed have access to their genomic information without any barrier.

It is important to note that nearly every interpretation of genomic information is likely to be noisy and uncertain. Therefore, most results should be taken with a grain of salt (maybe even a bag of salt). However, people should be able to get this information and decide for themselves. In short, we believe in the autonomy of people regarding their genome.

Hi Yaniv and Joe,

I am very thankful to you to have started DNA.Land. The secondary growth in the DTC genetics ecosystem (DNA.Land, OpenSNP, etc after 23andMe, etc) has helped me think through the response of civil society to surveillance capitalism (a logic of accumulation leading to a global redistribution of privacy rights and their diffusion into property rights, very interesting paper!). In a way, DNALand and OpenSNP repurpose the data held by 23andMe (and other similar DTC providers), at much lower cost. They implicitly also reduce our collective reliance on these commercial solutions.

With that perspective, I have started PersonalData.IO, which aims to leverage European personal data protection laws to help put back their personal data into the hands of citizens (across many verticals, not just health/genetics). For instance, this would allow individuals to get a copy of all their PatientsLikeMe data, for whatever purpose they might like.

Through PersonalData.IO I am now helping one of OpenSNP founders to get access to his 23andMe phenotypic data from 23andMe, including the self-reported information. I think legally 23andMe has to give him a copy of all this data, and that it could be beneficial if that data was also easily transferable around. It could also help you design DNA.Land if you know the structure that 23andMe is using.

I was wondering if:

  • you would be interested in similarly asking for your own data from 23andMe;
  • which services would you want to combine data with, that might not necessarily have an API?
  • you had any of your own thoughts on how PersonalData.IO could help most effectively.

Yaniv Here. Thanks for sharing your website with us. Sounds very interesting.

I personally would love to know what phenotypes I contributed to 23andMe. More broadly, I wish there was a strategy to rapidly get all of my medical information that is scattered throughout multiple organizations. In addition, I wish that I could get all my search data but as an API call. Currently, Google allows me to download the data but only using a manual process, so I cannot donate this data to other websites.

How do PersonalData's users get their data - in an electronic format or printed?

I'm building a data science computing cluster at our university. Just curious, what does your technology setup for all of this look like? What programs are you using?


Yaniv here: We use Amazon Cloud for DNA.Land. It allows us to scale rapidly when we have a large queue and to pay less when the queue is shorter. Most of DNA.Land website relies on custom scripts that run scientific programs such as GERMLINE, PLINK, and more.

Sorry if this has been asked already, but here goes anyway:

What do you think accounts for the differences in ancestry composition between DNA.Land and 23andme?


Joe here.

This is almost certainly due to the differences in the reference panels used by us and by 23andMe.

The way all of the ancestry inference methods work is that they take your genome and compare it to a set of people with "known" ancestry. Then (to a first approximation) if you seem more closely related to someone from France than to someone from Spain, the algorithm might call you "French".

The key is who is in that set of people with "known" ancestry. 23andMe uses one set of people, AncestryDNA uses another, and we use a third. So while I suspect that all of us would give you similar results at a high level (i.e. it's unlikely that one service would say you're Oceanian and the other say you're European), at a finer scale there might be considerable differences.

I have a technical and statistically oriented question. Most of the DTC offerings use some form of the Ilumina chips ( This yields about 700k SNPs. While I know that these sites have been strategically chosen for their higher variance and relevance, my question has to do with the other 9M+ SNPs that we aren't sequencing.

I imagine a very large number of those are the same in the vast majority of humans and a very large number can also probably be inferred from the SNPs that are tested (your site has this feature even).

While over large scales these kinds of things tend to wash out, over smaller scales they can be much more problematic. This is especially the case for someone like me who is quite admixed. A high degree of variation is inherent in my genome compared to your average European.

How does the SNP limit impact the relevance of personal genomics for someone like me?

And what does this inform about some of our assumptions in studies about risk? Especially for admixed people.

Is there a problem ethically in attaining population genetic research (for which the chips are well suited) under the guise of personal genomics?

As a case example, my wife was diagnosed with Ulcerative Colitis. At the time, 23&Me had a risk profile associated with it indicating a risk that was about 25% below the national incidence rate. Digging deeper I saw they only listed about 20 or so SNPs in that calculation. I did my own research in the literature and found a total of 41 sites that had been coded and recalculated her risk using published papers. It showed a risk profile about 10% higher than expected.

While I don't consider either method to be truly informative of her actual risk it highlighted for me just how much variation there can be in SNPs that aren't being "counted." This certainly seemed to be along the lines of the FDA's concerns as well.


Yaniv here. Great question and kudos for your manual analysis of the Ulcerative Colitis risk! In DNA.Land, we impute every genome, meaning that we take the ~700,000 input SNPs and use a large collection of fully sequenced genomes to complete the missing information on 39 million SNPs. You can learn more here about this process. The imputed genome is not as accurate as fully sequenced genome but for most common variants, we have >95% accuracy in inferring the variant.

About the ethical problem, it is well known that variants from African population are under represented in these chips. So there is an ethical issue here. In addition, African population are not subject to the strong bottleneck of the out-of-africa migration events. This creates other technical challenges (known as short linkage disequilibrium blocks) that require much denser arrays to map disease genes.

There's a known drop off effect that occurs once people see the interpretation of their genome, as the pace of discovery isn't fast enough to embroil us for long. How else could we make sites like yours more entertaining as such?


Yaniv here. First, we have a relative matching report that updates every day in DNA.Land. So by coming back to the website, you can discover new relatives. Second, we always work on new features that allow you to extract more information form your genome.

Good morning, What is the most common misconception about your line of work?


Yaniv here. Excellent question! The biggest misconception is that people think that DNA determine traits. People even say "It is in our DNA" to refer for a solid phenomenon. Ironically, this is exactly the opposite in genetics.

First, genetics rarely account for the entire variability of a trait in the population. For example, having obese parents does not mean that you are going to be obese. In fact, for most traits, such as heart disease, personality traits, breast cancer predisposition, genetics only explains about half of the risk if not less. For other traits, such as life span, genetics accounts for only 1/8th of the total number of years! So genes do not tell the entire story.

Second, genes contribute to only to predispositions not to traits. The manifestation of a trait depends also on other things such as the environment. For example, consider mutations in the PKU gene. Fifty years ago, such a mutation meant mental disability. However, nowadays, we can offer a special diets that control the phenylalanine levels so the kid will be healthy. This shows how the action of the gene (PKU) is changed by the environment.

What's the state of the ethics discussion in sequencing the publics genome outside of a medical, or research context?

For example in computer science, there is a recent issue with the misapplication of machine learning in the justice system, where racial biases have been introduced into the automated prediction of recidivism, leading to an unfair distribution of early release grants.

Similarly the FBI has been growing a database of genetic information, collected from the criminal justice system. Currently that information is being used as a means of identification, however in the future it's not unlikely that government organizations may want to consider other ways to utilize existing data.

In the computer science field, there's not as much discussion about the ethics of how technology is used. Most discussions focus on the individual, and how the individual should safe guard themselves, but as to how it applies to institutions, is something that's mostly only discussed by journalists after something has happened.

I think it could help if programmers where taught to start thinking of these ethical problems, before they start encountering them in the real world. Similarly, I wonder if other fields are considering this as well? Are there discussions in the scientific research community about the limitations of use outside of a research setting? Is this something being discussed with students?


I totally share with you the view about training computer science and engineering students to think about the ethical implications of technology.

In our class at Columbia University, regularly cover ethical aspects of genetic research. For example, when we taught the concept of "heritability", we discussed the highly controversial paper by Arthur Jenssen on IQ boosting. We also covered forensic DNA and familial searches and had a long discussion about the ethical implications.

The problem is that some CS professors feel that this is not their area of expertise and therefore decide not to cover these aspects.

Tha me for doing this AMA. I'm really interested in the new information we could gather if we can get a large portion of the population's DNA mapped and compared.

My girlfriend never knew her biological father. The possibility of having children together is making me worry a bit to not have big part of our future child's medical history. What could DNA.Land tell us about that? And what could you tell us now that you couldn't tell us 10 years ago?


I totally agree with the answer by PeruvianHeadshrinker about the medical side.

DNA.Land (and other websites) can help your girl friend find biological relatives that will eventually help her identify her father. While this process may take time and patience, there are many success stories of individuals that found their biological families using this way. The Facebook group DNA Detective is an excellent source on this topic.

Does unlocking my genome void the warranty?



Next month I am going to order a DNA kit for myself and just yesterday was searching for a website to upload the results to. Is DNA.Land free to use? What type of results can I expect if I submit my DNA results?


Yaniv here. DNA.Land is free and is a not for profit. Our website will provide you a list of relatives, impute the missing parts of your genome, allow you to navigate your data, and reveal your ancestry using a really cool report.

And while you get all of that, you can also contribute your data for important scientific studies :-)

What do you think will be the biggest effect of direct to consumer genetics? What change(s) do you think this will create in people/societies?


Yaniv here.

Consumer genetics' biggest effect is education of the general public about genetics. It is one thing to hear stories in high school about Mendel's peas and other thing to actually see your genes and how they segregated in your family.

This type of education can go a long way. For example, it can help fight racism and racial theories. Think about a person that discovers that he has ancestors from multiple locations and relatives all over the world. It is easier to understand the fallacy of racial theory when clearly see these concepts in your own family.

Of course, there is also the medical side of mining consumer genetics for scientific discoveries.

Hi Dr. Pickrell and Dr. Erlich, thanks very much for doing this.

Many large GWAS consortia, with large sample sizes (~100k), have made summary association statistics freely publicly available, including for non-significant variants. In that context, what additional value is provided by the data that will be collected by


Yaniv here.

First, we do not plan to use the data only for GWAS analysis. There are other types of studies where a large collection of genomes might be highly beneficial. For example, see our recent manuscript on DNA fingerprinting were we used the DNA.Land participants data as a control to benchmark our method:

Second, for many traits, GWAS is yet to get 100,000 people.

Third, even for published GWAS, many study do not publicly share their summary statistics data. For example, it is nearly impossible to get raw GWAS data for large scale cancer patients.

Good morning, Professors,

My partner works in the fields of Histology and Embryology, so I have heard quite a lot about the so-called Genome project, where we are supposed to get our genome data for a small amount of money. What you don't read often, if at all, is the fact that insurance companies are buying and plan to buy more and more of that information from companies like yours, in order to be able to analyse each of their (future) clients' genome. Why? Because if they discover you are going to suffer from X condition, they will be able to cash more money from you from the very beginning, or not accept you at all.

My question is, are you, as a company who is clearly making money by analyzing our genome, aware of the dangerous situation you could provoke if you keep on selling our genome info to insurance companies? Do you feel responsible for any future deaths related to this?


PS. I am not saying you personally sell the data. I am just trying to make people aware of the other side of the coin. All this stuff is great, I know, but it is only fair to know what we are getting into when we pay to get our genome "translated".


Yaniv here. Thank for this question and sorry for the misunderstanding. DNA.Land is not a company but a not for profit research project. We are not selling data to anyone and according to the consent we will never release your data unless you explicitly authorized us in writing.

More generally, I am not aware of insurance companies that purchase genetic information regarding their customers. At least in the US, the Genetic Information Non-discrimination Act (GINA) strictly prohibits the usage of genetic data for health insurance decisions.

When do you expect a big % of population to have their full genome analyzed?


Yaniv here. Currently about 3 million people have access to their genome in the format of genome-wide genotyping array and based on the trajectory, we expect an exponential growth.

I am not sure when whole genome will replace genome-wide array for genetic genealogy or wellness applications. The price is still too high ($1000 vs. ~$100) and the benefit for most individuals is not huge.

Regarding the microbiome analysis offered by Seeq, do you expect that multiple tests for a specific user over a period of time would return relatively stable results? Or would you expect that the test would return different results reflecting the user's recent diet and lifestyle in the time prior to taking the test?


Joe here.

I suspect your microbiome will change quite a bit over time, within parameters set by your genetics. I.e. it might be like your weight, which (probably!) fluctuates quite a bit according to your diet, age, and other factors, but which has strong genetic influences as well. Or maybe more like your mood, which varies on a much faster time scale but also subject to both your surroundings and your genetics.

Hopefully with this project we'll be able to answer that question better!

Good morning! Was wondering if you could tell if you've personally made any findings in your research pertaining to MTHFR methylation defects? What do you believe can be done to further identify and aid these individuals in terms of education and preventative medicine? Do you believe that genetic testing and subsequent related action can lower the incidence of chronic disease? Thank you!


Yaniv here.

Thanks for your question but my lab does not work on this gene. For certain diseases, such as CF, Tay Sachs, genetic tests of carriers can provide actionable information to lower the incidence rates in the population.

When DNA is replicated to make new cells, a piece of the very end of it gets clipped off... which from what I understand is why as we age, why hair loses its color, why skin loses its elasticity, and why our systems become less efficient.... do you think there is a way to reduce the rate our DNA is shortened (and thus extend our lives)?


Yaniv here.

Data show that we don't need to reduce the rate of DNA shortening to expand life expectancy. In the last 150, life expectancy grew by 35 years without any direct intervention of the DNA shortening process. I am very optimistic that there are many strategies to live longer without messing with the basic replication machinery :-)

Hi Joe, I read your blog post "bitcoin powered genomics" which was a quick way to show how to incorporate micropayment transactions in order to obtain calls from public databases at low cost which in turn help to maintain the upkeep of the database. Likewise, blockchain technology has been talked about by Eric Topol and others as a way to ensure health data privacy and return ownership to the individual. I'm wondering your thoughts on how you view the future of health privacy and what role do you envision the blockchain playing?


Joe here.

Glad you read my post! As you know, right now health data is stored by hospitals and health care providers, and they aggregate and sell this data amongst themselves via third parties. It's extremely difficult to 1) get access to your own data and 2) know who else has access to it.

I think a good way forward is to empower individuals to make some of their own decisions about their data. The tools to do this just don't exist today, and if I try to predict what they will eventually look like I'll sound silly in 10 years, so I'll hold off :)

One type of thing I tinkered with was a way for you to sell and record access to your data (genome or health data) via an API using bitcoin. In this model, if someone wants to get your health data, they have to pay you and tell you at least some information about themselves that you can track (in the simplest case just some pseudonymous information, but you could imagine making it identifiable, though bitcoin enthusiasts might bite my head off for even suggesting that).

This AMA is being permanently archived by The Winnower, a publishing platform that offers traditional scholarly publishing tools to traditional and non-traditional scholarly outputs—because scholarly communication doesn’t just happen in journals.

To cite this AMA please use:

You can learn more and start contributing at


Cool! Thanks a lot!

Hi! I am a non-formal natural resource educator who is teaching a inquiry-based science course for homeschooled middle and high school students this year. We will be covering basic genetics and I would LOVE to incorporate Seeq into the course. However, before I commit to telling my students about the project, I would like to know a bit more about it. These are my most pressing questions:

  1. How do you protect people's genetic information and ensure privacy?
  2. Will submitted DNA be used for research or other purposes?
  3. Can students under the age of 18 submit samples and, if so, do you have some sort of release form for parents to give their approval for the submission of samples by minors?
  4. How long will people be on the wait list?
  5. Once samples are submitted, what is the turnaround time?
  6. Do you plan on adding a FAQ to your website to address these questions?

Thank you for your time! I am extremely excited to hear your responses!


Joe here.

Thanks for these questions! Unfortunately we are limited to individuals ages 18 and up.

Do you think dct genomic sequencing will have a backlash where your health insurer/ employer requires to see it?


Yaniv Here. At least in the US, the Genetic Information Non-discrimination Act (GINA) prohibits your health insurer or your employer to base their decisions on genetic data or family history. A company in Atlanta had to pay $2.25 MILLION in damages for two employees after collecting their genetic data. So the court takes GINA quite seriously. Therefore, I don't think that the scenario in your question is very likely.

Just curious (sorry if this question doesn't really apply)- what is the closest thing to a superpower that I could get by editing my genome?


Yaniv here. I have to say that by editing your genome you are more likely to be an "underpower". Your genome went through millions of years of evolution with genes tested and optimized under various conditions, so you should be already in a pretty good shape :-)

Moreover, in genetics, we know a lot about disease-associated mutations but very little about well-being mutations. For example, we know hundreds of mutations that can cause intellectual disability but no mutation that can boost IQ to 180.

Finally, even if it is possible to use CRISPR to flip all of your disease associated variants to their "normal" state. Joe Pickrell, my NY Genome Center and AMA colleague, recently discovered that some variants are act like a double edge sword. Each of their state is associated with a different diseases. So you can flip them just to go from risk for say schizophrenia to parkinson.

I'm a fiction writer and I'm doing researching for a story. It has to do with modifying the dna of trees to make them grow faster, something like ten times or so. The process escapes the lab environment and into a normal forest. I don't know if current technologies are close to this or not.

My questions to you would be is something like this possible? What type of timeframe would it take after the DNA modification for results to show? How could this change agent be passed? Air, pollen, water?

Thank you.


Yaniv here. Cool idea. In fact, making plans to grow much faster is something that we as humanity did multiple times through the domestication of common plants. For example, maize used to be a small and unimpressive corp before domestication. But rounds of selective breeding created the nice big corn ears that we buy today. I am not aware of any technique to accelerate this process. But at least the basic idea of DNA changes that relate to different growth patterns does not sound implausible (cancer is another example of a fast growing tissue due to genetic changes).

Joe & Yaniv, why don't we see double helixes in nature outside of DNA?


Yaniv here. Good question. For something to be a double helix, you need self-complementary. I am not a chemist but I presume that this type of property is relatively rare. We do see helixes (but not double helixes) as the building blocks of many complex proteins.

What're the specs on the Android App - I have a Moto X 2014 and it says it's not compatible - I find that a bit weird, or is it because it is limited to the USA?


Joe here.

Sorry, yes, that's probably because it's US-only. We hope to be able to ship internationally soon!

Additional Assets


This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.