Saturday, November 3, 2012

Autosomal DNA Testing: Phasing

     Good Day Everyone. Hope everyone is doing well!!!!  In this document, the process of Phasing will be discussed and explained. Phasing is the newest craze in Genetic Genealogy. Right now there aren't that many tools out there to perform phasing features. What is phasing? What is it all about? Let's take a look at the new kid on the block.

Introduction To Phasing    
     What if you wanted to DNA test one of your parents but you can't. Let's assume one of your parents is unavailable and you would like to gather DNA from that parent. Can you do this? The answer is yes. This is where phasing comes in. Phasing is the attempt to reconstruct a parent's DNA data from a single child and the other contributing parent. The idea behind phasing is: Child DNA - Parent 1 DNA -> Parent 2 DNA. The result from phasing is a pseudo DNA data file that contains SNPs of the untested parent. In order to phase a single parent's DNA, you need the DNA data file of BOTH a tested child and tested parent. Let's take a look at how phasing works.

    If you remember, an individual has two of every known SNP like this -> AG. The letters "A" and "G" are DNA bases called SNPs (snips). SNP stands for single nucleotide polymorphism. SNPs are sometimes erroneously referred to as alleles. The reason you have two of every known SNP is that you receive one from each parent. Let's say a child has the following SNPs or alleles -> AG. Now let's assume that a tested parent (mother) has the following SNPs -> AT. Can we figure which SNP the child received from which parent? The answer is yes. Since both child and mother share the common SNP -> "A", (Mom -> AT, Child -> AG), this means the child must of inherited the "A" from the mother and the "G" from the father (Mom -> AT, Child -> AG, Dad -> ?G). The result then will be a phased DNA data file that contains the single paternal SNP -> "G". Normally your DNA data file from either Family or Relative Finder has two of every SNP or allele. However a pseudo phased DNA data file will contain only one (half) of every known SNP or allele.

   Remember that an autosomal DNA test produces matches. When a person is a match to you, that person matches to half of the SNPs that are in your normal DNA data file. In other words, a match is related to one side of your family. Because a phased pseudo DNA data file only contains SNPs from a single parent, only matches from one side of your family are revealed. If fact, this is the reason behind phasing. 

Phasing: The Reason Behind It
   Phasing is good in cases where you don't have a DNA data file from a parent. This works well in cases where one has say a deceased parent. For example, recently I phased SOME of my deceased paternal grandfather's DNA data. However phasing really shines in "lining up" your matches. Remember that an autosomal DNA test produces matches on both sides of your family. More important, an autosomal DNA test cannot tell you which side of your family a match is on. There are two main ways to determine which side of your family a match is on:
  1. Simply test both of your parents and see where the matches line up. If a match appears in the DNA match list of a particular parent, then you know which side the match is on.
  2. Simply test a single parent and observe if the match doesn't appear in the DNA match list of the tested parent. If the match doesn't appear in the tested parent, then match can be assumed to be on the opposite parent's side. This type of exclusion can only be done when considering close relatives (parent through and including 2nd cousins). However, starting at or beyond the 3rd cousin level, exclusion is based entirely on a probability. Starting at the 3rd cousin level, a non-match to a parent doesn't necessarily mean no relation. In other words, a non-match can still be related to the tested parent, even though that tested parent didn't match. This is because the "masking" effects of recombination begin to appear at the 3rd cousin level.
The third way is phasing. Phasing will automatically reveal which side of your family your matches will fall on. Phasing is considered to have much promise. However there are limitations and catches to phasing. As in most cases, it's never that simple. Let's read on to find out. 

Phasing: Limitations and Catches
    The biggest catch to phasing is that your pseudo DNA data file will only contain, at maximum, half the SNPs (alleles) that a tested person's DNA data file will contain. Remember with phasing, you are creating a virtual DNA data file without actually testing an individual. From a practical perspective, a phased DNA data file will actually contain much less than half the SNPs a normal DNA data file will contain. There are two reasons for this: Random No Calls and Random On Homozygous. Let's take a look.

Phasing: Random No Calls 
     Let's say both a child is -> AG and the mother is -> AG at the same location on their chromosomes. In this case, the child's two SNPs are different, but they both are identical to the parent. Can we determine between the child's "A" and "G", which SNP came from which parent. The answer is NO. As you can see, either the "A" or "G" could have come from the mother. Therefore, there is no way to deduce which SNP came from the mother or the father. This is what is referred to as a random no call. Random-no-call SNPs are the reason why linked DNA segments are used in an autosomal DNA test to identify common ancestry. A DNA data file that's generated from a tested individual, by default, contains random-no-call SNPs. A well designed matching algorithim would simply ignore all random-no-call SNPs as it detects them.

   In a phased DNA data file, random-no-call SNPs are not inserted into the file. This is one of the reasons why a phased DNA data file is much smaller than a normal DNA data file. To give you an idea, here is a picture of the phased output of my paternal grandfather - William E. Handy Sr.

I recently used the new Gedmatch Phasing Utility to create a pseudo phased DNA data file of my deceased paternal grandfather - William E. Handy Sr. The kit number is PF208196P1.

     If you click the picture shown above, you will see a phased listing of my paternal grandfather's DNA matches. The top match (F208196) is his son, who is my father, Steve Handy Sr. A parent and child normally share between 3300cMs - 3400cMs of DNA. As one can see, my dad and his father only share 410cMs of DNA in this phased reading. The low cM DNA amount is due to size of the phased DNA data file. Moving on to myself (F200507), my grandfather and I only share 217.3cMs of DNA. We are suppose to share between 1700cM - 1900cMs of DNA since William Handy Sr is my grandfather.

     One way around this is to simply phase all of your full siblings against the same parent. That way, you can build a bigger list of matches which fall on the side of the phased parent. Each sibling will likely generate a different phased virtual DNA data file against the same parent. However each sibling has the potential to filter and reveal more matches that fall on the side of the phased parent.

     The important concept though is that all of the matches shown at the above URL link are ALL on my paternal grandfather's side. In other words, the matches shown at the above URL, all "line up" on my paternal grandfather's side. There is no need to worry about matches shown on my father's maternal side because none are shown in this phased output. The phased DNA data file has completely filtered out all of my father's maternal matches.

Phasing: Random On Homozygous   
     Let's say both a child is -> AA and the mother is -> AA at the same location on their chromosomes. Both the child's SNPs are the same value (homozygous) and identical to the mother. Can we determine which SNP came from which parent? The answer is yes. The father and mother both contributed a SNP with the value of "A". However, this will not help in an autosomal DNA test. The reason is that because both parents are identical to the child at that location, there is no way to determine which parent a match is related to.  

   For example suppose mom is -> AAAAA and dad is -> AAAAA. If a match is -> AAAAA, then how can you know which parent the match is related to? This is called Random On Homozygous. In normal tested cases, this presents the same problem as in a phased scenario. People whom are descended from or are apart of an endogamous population suffer from random on homozygous or ROH. A good example would be people of Ashkenazi Jewish ancestry. Many first cousins married each other and produced offspring. As a result of the inbreeding, the SNP or allele pool can become highly homogenous over time. Phasing would not be help much in this case.

On a final note half siblings already have phased data.

     Well that's it for phasing. Hopefully you know have a basic and clear understanding of phasing. Currently there are two tools out there that do phasing - GedMatch and David Pike's Tool. To see that actual comparison in my case - use kit number PF208196P1 in the second link - Compare Kits. To generate a phased file use the first or third link shown below.              



  1. DNA the basic structure that proves your individuality can also be useful to take possible precautions in case of hereditary diseases. In this regard DNA testing is no doubt very important. Thank you.

  2. Thanks for the clear explanation -- I have autosomal data for my mother and my two brother and myself. I have phased my data and my brothers, producing three paternal and three maternal phased files on How can I combine my and my brothers' paternal phased files to get a more accurate representation of my father's DNA?


  3. Hi how are you rewarren. You cannot combine the separate phased files. Each phased file produces matches the line up on the respective side.

    Hope that helps

    1. The answer makes no sense. If he creates a phased kit for his dad, using the technique described... the same technique would tell you which segments each brother inherited, offering a more complete kit data, excluding no-calls. Of course it can be done.

  4. Several full siblings all phased to their mother yield phased pseudo DNA files for the father. Rewarren asks if they can be combined to more fully represent the DNA of the father. Steve replies that they cannot be combined. But if we lined up the SNP values from the 3 pseudo DNA files according to SNP position, each would fill gaps in the others, and in some cases you would get both of the father's SNP value from the three, at least for a portion of the SNPs. It seems like these could be combined, to me, although whether or not any analysis system (GEDmatch or whatever) could understand and use the combined results is a separate question.

    1. Phasing basically aligns you matches to each parent. It does by deducing with allele or DNA marker came from which parent. To the best of my knowledge - there is "combining". A parent supplies an allele to a child and the purpose of phasing to determine that

    2. This comment has been removed by the author.

  5. Masud
    I like this article very informative for parent like me who is struggling to control and limit my daughter over playing tablet and smartphone games. When I was trying to know how to limit and control them I have searched and tried everything but there’s this app that do such a wonder it was Screen Ninja that had helped me a lot without monitoring them what they do on tablets and smartphones. This Screen Ninja helped me control and limit their usage and playing time and if they want an additional playing time they need to solve a math problem to gain more minutes and you will just see how they passionately solve math problems. This app helps you to control, limit and at the same time teach them about mathematics depending on their age. :)

  6. I found this cool post here and enjoyed reading your articles, thank you. This is really a helpful post on dna testing

  7. Thank you for bringing more information to this topic for me. I’m truly grateful and really impressed.
    dna testing kit