Thursday, September 12, 2013

Procedure For Finding Shared Matches Using Excel

Finding Shared Matches In Excel.

Good Day Everyone,

   I wanted to present a simple way use Microsoft Excel to compare and find shared matches between two or people. The initial steps first requires you to have Excel 2007, 2010 on your Windows machine. Then you need to go on 23andMe to the Countries of Ancestry Page and grab any two persons Ancestry Finder csv files.

Here is how you get to the Ancestry Finder csv file for single person.
a) Login to 23andMe account
b) Then at top -  My Results -> Ancestry Tools -> Countries Of Ancestry
c) On the Countries Of Ancestry Page - click drop down window for each person. Pull down web page and on bottom - double click the blue button that says - "Download.........Ancestry Finder File"
d) save csv file to your computer

Here is how you create the spreadsheet
a) double click each csv file for each user. Excel opens up. 
b) then copy the column that says matches for a particular person into another spreadsheet. Do the same with another person. The result should be a single spreadsheet with a minimum of two columns that you are comparing.

Here is how to run VBA code to compare columns
1) Open up new excel spreadsheet with names of matches of two or more people.

2) In the new excel spreadsheet - hit ATL and F11 key. This opens the visual basics editor to run code.

3) In the visual basic editor - on the toolbar - look for small green arrow pointing to the right. Looks like a small green triangle. click this green triangle

4) This opens a small window. Give your script a name and click create button.

5) erase code in the window and replace with this code:

Private Sub CommandButton1_Click()
Dim CompareRange As Variant, To_Be_Compared As Variant, x As Variant, y As Variant
str1 = InputBox("Enter Column Name to be Compared")
str2 = InputBox("Enter Column Name to Compare")
str3 = InputBox("Enter Column Name to put the Result")
Range(str1 & "1").Select
Selection.End(xlDown).Select
Set To_Be_Compared = Range(str1 & "1:" & Selection.Address)
Range(str2 & "1").Select
Selection.End(xlDown).Select
Set CompareRange = Range(str2 & "1:" & Selection.Address)
i = 1
To_Be_Compared.Select
For Each x In Selection
For Each y In CompareRange
If x = y Then
Range(str3 & i).Value = x
i = i + 1
End If
Next y
Next x
End Sub

6) Press the small green triangle button again. this runs the code and you will be prompted to enter the row letters that you want to compare and what column to place the results in

7) The result is your spreadsheet will have a new column with shared matches.

Here is the URL with the instructions starting at line that says: "Find duplicate values in two columns with VBA code"


Thanks
Steve

Wednesday, May 29, 2013

Understanding Correlations and Debunking Misconceptions In DNA Genealogy

     Good Day Everyone - How is everyone doing. Fine I hope. In this tutorial, we are going to get a firm grasp on the basis of genetics as it relates to DNA genealogy. The reason for a return to the basics is to debunk certain misconceptions that seem to be currently floating around in the general public concerning DNA Genealogy.  Debunking a misconception is important. It's important because when a consumer purchases a product, he or she has expectations that the product will fulfill. When those expectations are fueled by misconceptions, the consumer may develop unrealistic expectations that naturally don't get fulfilled. It's sad and unfortunate, but it's a common practice to use a misconception to sell a product. In science, this is a common occurrence and is easily rectifiable with a return to the basics. 

With that in mind - let's begin our discussion.

DNA - The Basics
DNA Genealogy is very popular today. In an age where technology allows a person to send in small DNA sample and get results quickly, many expectations are formed. This is of course logical. If you pay for a product you expect results. However no matter how popular DNA Genealogy is, DNA Genealogy is still based on a science. That science is Genetics. In genetics - the pot of gold is DNA.

DNA stands for deoxyribonucleic acid. DNA sits inside nearly all the cells of a single living thing. DNA carries and transmits biological information from parents to offspring. This is the fundamental principle behind the science of genetics. In fact - in genetics there is an equation called the Central Dogma Of Life

                             DNA -> RNA -> Protein
                             (Central Dogma Of Life)

The above equation is how all life proceeds. Since life has been here on the planet, this is how life, biologically speaking, works. When dealing with genetics, a feature, concept, or any defined entity must somehow fit into the above equation. Technically in genetics, something must be an attribute of a genetic mutation for it have an basis in genetics.

For example, sickle-cell anemia is an inheritable disorder that can be passed along from a parent to its offspring. The reason is that a single mutation in a gene causes this disorder. In other words - sickle-cell anemia is an attribute of a genetic mutation. Sickle-cell anemia is an inheritable trait and it fits into the fundamental principle of genetics.

It's at this point where misconceptions can arise. What are those misconceptions? Let's a took.

Misconceptions In DNA Genealogy
Misconceptions in DNA Genealogy generally stems from a misunderstanding of the basics in genetics. Many of the present misconceptions generally take the form of some defined construct that's NOT inheritable and NOT an attribute of a genetic mutation. The most popular misconceptions in the general public specifically deal concepts such as race, religion, nationality, and ethnicity as having some genetic basis. For this tutorial - let's focus on ethnicity. 

Ethnicity is a socially defined category based on common culture or nationality. For example - the term "African-American" - typically refers to a group of people whose ancestors were apart of the West African slave trade during the 1700s. You can expand the definition of ethnicity as it's deemed fit, but no matter how you look at it, ethnicity is socially defined and self determined. Ethnicity is a social construct devised by humans. If a person wants to place themselves in a different ethnic designation over night, then he or she can do that.

However ethnicity is NOT an inheritable trait. In other words - ethnicity is not an attribute of a genetic mutation. Your ethnicity is NOT reflected in your DNA. There is simply no way that it can be reflected in your DNA. Long before humans came along and devised social constructs, life evolved the ability to transmit features from parent to offspring within the DNA molecule, not transmit social designations within the DNA molecule. You can change your ethnic or racial designation, name, location, and religious affiliation. However you can't change your DNA.

Another popular misconception deals with the word "Ancestry". Like so many words, a word can have different meanings in different contexts. In genetics, ancestry means common descent. Ancestry in genetics means when two or more individuals share a unique feature that's derived from a common ancestor. The problem that occurs is when the term ancestry is given a social tone and then used in a scientific and objective arena. 

These misconceptions mentioned above are very popular within the general public. Why the popularity? Well that's where the term correlation comes in. Let's take a look.

Understanding Correlations: Beauty And The Beast
If you have taken statistics before, then you probably have come across the term "correlation". Correlations are like a doubled-edged sword. On one hand, a correlation can be useful. On the other hand, a correlation can dangerous and misleading.

Simply put, a correlation is a casual relationship or association between two or more variables. Correlations are a major reason for the popularity of many of the misconceptions that exist in genetics and thus DNA Genealogy. Many of the so-called BGA, Admixture, Or Ethnic Population tests on the market are based on correlations. That's why those tests are very convincing.

A simple example of a correlation is between time and highway traffic. In many major metropolitan cities across the US, highway traffic tends to occurs at specific times of the day. For example in Chicago, Illinois, Interstate 94 is a major highway system that leads in and out of the downtown area of Chicago. Interstate 94 experiences a consistent and heavy amount of traffic between the times of 6am-9am in the morning and 4pm-7pm in the evening. This happens so regularly at the above times, that the term "rush hour traffic" is used to label the phenomena.

Rush hour traffic is a simple example of a correlation. Here we have a casual relationship or strong association between time (one variable) and traffic (a second variable). Correlations can be useful in certain situations. Let's read and find out why.

Understanding Correlations: The Beauty
A correlation can be useful because it can have strong predictive power in certain circumstances. For example - in our simple example above, if highway traffic consistently occurs at 6am-9am every morning, then one can logically predict that since a highway will experience major traffic, one can avoid it. Many of us subconsciously use correlations on a daily basis to make predictions in order to adjust our behavior accordingly.

However correlations can have a dangerous side as well. Let's see why!!!! 


Understanding Correlations: The Beast
Correlations can be very dangerous and misleading. This is especially true in science. If you have ever heard the term "Correlation Does Not Imply Causation", then you know why correlations can be dangerous. The danger from a correlation is when the casual relationship between variables is perceived of as a direct or cause-effect relationship.

Here is an example of some dangerous logic -> "The heavy rush hour traffic is caused by it being between 4pm-6pm."

Going back to our simple traffic correlation example, if heavy highway consistently occurs at a specific time, then one may actually believe that time actually causes the traffic. This of course is not true. Time does NOT cause the traffic. The traffic is caused by the fact that most people have current work hours that end at a time between 4pm - 6pm. The result is that many people between those hours simply head to the highway which actually causes the congestion and traffic.

This is why correlations can be quite dangerous. A consistently confirmed prediction from a correlation can lead to a false belief that one variable is the result of another variable. When dealing with correlations - what you want is to identify the cause of the casual relationship between the variables. That's the key. It's important to understand that there is a big difference between a casual or associative relationship versus a direct relationship. For example, there is a direct relationship between high blood pressure and salt. There is no correlation between salt and high blood pressure. Salt actually causes high blood pressure. 

Now that we have a solid understanding of correlations, let's turn our attention back to DNA Genealogy

Correlations In DNA Genealogy
If you are wondering if correlations exist in DNA Genealogy - then you are correct. In fact, correlations in genetics is a major reason for the spread and popularity of many of the misconceptions that were mentioned in this tutorial. In genetics, correlations take the form of known and studied DNA markers that are strongly associated with certain defined ethnic populations. This association is actually the basis for many of the so-called Admixture or Ethnic Population Tests such as Docadad Admixture, and for companies such as DNATribes, African-Ancestry, etc to market a product.

A good example of a correlation in DNA Genealogy is between a haplogroup and ethnic population. A haplogroup is a population of people that share a unique set of DNA markers on either the mtDNA or Y-chromosome. For example - the Y-DNA haplogroup known as Q-M3 has a strong association with the ethnic group known as Native Americans. In fact - the association is so strong that it can be used a strong predictor in certain cases.

Another example of a correlation is between an AIM and a geographic region. AIM stands for Ancestry Informative Marker which is basically a DNA marker that's present at a high frequency in a population.  Certain AIMs are strongly associated to certain populations that have a known geographic origin. For instance - the Duffy Null allele is an AIM that has nearly a 100% frequency in Sub-Saharan Africans. AIMs are the basis for BGA tests such as Population Finder or Ancestry Composition. 

With such powerful correlations in genetics can someone's ethnicity, race, religion, or geographical point of origin be determined from their respective genetics?

Dangers Of Correlations in DNA Genealogy
The answer to the previous question that was asked two sentences above is a simple no. A correlation may seem very persuasive but it's nevertheless still a correlation. No matter how strong a correlation is - a casual relationship is NOT a direct relationship. Simply put your ethnicity, race, religion, or etc is NOT a product of your genetics. In genetics, the golden rule is that a defined entity must be an attribute of a genetic mutation. If the golden rule is not there, then it doesn't hold water in genetics. 

It's understandable why it can be hard for someone to separate their ethnicity or any social construct from their respective genetics. A correlation can generate an illusion of a direct relationship when there actually is NOT such a direct relationship. The situation is made worse when you have various companies advertising such fallacies. For example, the term "genetic ethnicity" is used by certain organizations in order to sell a product. However, a solution in dealing with a correlation is step back and understand the reason for the casual or associative relationship

In this case, why is there an strong association between certain genetic markers and certain ethnic populations?

The reason has to do with an ethnic population's martial and reproductive patterns not their genetics. Let's assume a unique genetic marker arises in a population via a mutation. If the members of that population reproduce with only members of the same population for an extended amount of time, then an associative relationship will form. This is how a genetic marker can become associated with a population. The result is that a correlation will form. This is especially true if the population retains a small size over time.  An example of this is with the ethnic group known as Native Americans. The Y-DNA haplogroup known as Q-M3 has a high frequency and strong association among Native American males. This is due to the strict, martial practices displayed over a long period of time. Many Native American males mated with only Native American women over time and simply never deviated from that practice.

Another way to expose a correlation is at the prediction level. For example - the Y-DNA haplogroup known as E1B1A has a strong association and frequency among African-American males. Going on correlation logic - a male who identifies himself as African-American should possess the E1B1A haplogroup. My 2nd cousin is Lewis Lamar. His ethnic designation is African-American and yet his Y-DNA haplogroup is R1b1a. 

With that we will end our discussion on correlations and misconceptions in DNA Genealogy. I hope this tutorial has shed some light on the dangers of correlations in certain circumstances as well demystifying some prevalent misconceptions. So the next time you purchase a genetic ethnicity test, tell them you want your money back LOL!!!!!!!!!!

Take Care
Steve Handy

Sunday, January 13, 2013

Handy And Curd Family Connection

It's said that good things comes to those who wait. This may be true in DNA Genealogy as I have stumbled upon another discovery that was made on the Handy side of my family. This discovery was, not surprisingly, confirmed via DNA Genealogy. The difference being in this case, the new DNA Genealogical services of Ancestry.com lent a helping hand. Recently, Ancestry.com have entered the DNA Genealogical arms race with their new product - AncestryDNA. Let's take a look!!!!!

It appears that the Handys have Scottish ancestry.
Recently I took an interest in uncovering some of my surname ancestry. My last name is Handy. It was known that the Handy lineage, for which I am descended from, hail out of Nashville Tennessee. The earliest male Handy that was known was William Henry Handy (1881-1947). Shown in the picture toward your right are my uncles, father, and grandfather - William Ernest Handy Sr (1921-1994) shown far right. William Sr's brother, Clarence Handy (1922-1992), is shown in center with tie. William Henry Handy was the father of both Clarence and William Sr.


William Henry Handy (1881-1947)
Shown toward your left is William Henry Handy (1881-1947). Not to much was known about Henry Handy. Henry was born in Nashville Tennessee. Henry eventually migrated to Chicago, Illinois and worked for Chicago Steel Mills. Early in life, Henry Handy met and married Alberta Woodard in 1918. Other than that, not to much was known about Henry Handy. I was determined to gather information about Henry Handy's past. Therefore I turned to his SSN application.



SSN Application of William Henry Handy
The eFOIA act is a wonderful law. Called the Freedom Of Information Act - it ensures public access to government records. When a person becomes deceased, their respective SSN is released into the public. You can then order the deceased SSN application. The reason for this is to get the parents of the deceased. Toward the right, is the SSN application of William Henry Handy. If you notice, Henry Handy gave the identities of his parents - Owen Handy (1862-1916) and Emma (1865 - ?). 


Notice that Henry Handy didn't give the last name of his mother. This is likely due to the fact that Emma's last name wasn't known at the time. It's actually Emma, and her ancestry, is what this blog article is about. Let's take a look!!!!

DC of Owen Handy (1862-1916)
The original goal was to uncover the strict paternal Handy ancestry. In other words, I was trying to discover the earliest known Handy male ancestor in my surname lineage. This has changed because currently, I don't have any information on Owen Handy's parents. That's okay because valuable information was learned in the process.  Shown toward your left is the DC of Owen Handy, The informant was his daughter - Hannah Handy-Hudgkins. Henry Handy apparently had siblings. There was Hannah (1892-1945), Ira (1896-1910), and Jim (1902-1944). 




Marriage Cert of Owen Handy and Emma Lanius.
Determined to gather history on Owen Handy, I did a search on Owen Handy on Ancestry.com. What I found out was that there was only a single marriage certificate associated to Owen Handy. Owen Handy married a woman named Emma Lanius. 

If you remember - on Henry Handy's SSN Application, Henry Handy apparently could not recall the last name of his mother Emma. I then came to the conclusion that the Emma mentioned in the SSN application and the Emma mentioned the above marriage certificate, were the same woman. (As a side note, on both Ira and Jim Handy's DCs - Emma's last name of Lanius is fully stated). As we are going to see, the DNA evidence is going to help confirm Emma Lanius as an ancestor. Now let's look at Emma Lanius and her ancestry.

Emma Lanius aged 16 and Family
Shown toward the left is Emma Lanius, her siblings, and her parents. This was shown in the 1880 Census. Emma was 16 years old. The snapshot photo was taken from Ancestry.com. The actual photo is information on Emma Lanius's mother - Jane Curd. More on that in a second. Not much is known on Emma Lanius outside of her marriage to Owen Handy in 1882 and the children she bore by Owen Handy. One interesting fact is that Emma Lanius's younger sister, Mary Lavinia Lanius, did meet and marry a man named William Bridge. They both migrated to Texas where their descendants reside today.  

Emma's parents were Matthew Lanius and Jane Curd. The maiden name of Curd is confirmed by the 1865 marriage certificate of Mattew Lanius and Jane Curd in the Tennessee Wilson County Area. (I will post it on the bottom of the blog)



Notice two things before we move on. First - Jane Curd-Lanius was designated as being mulatto. In the old days back in the south, the term "mulatto" loosely meant that your father was European and mother was Negro. Second - Jane Curd's father was born in Tennessee. We will see why that's important shortly. As a side note, Matthew Lanius's mother - Sallie Lanius (aged 50) is shown as well. 

Curd Relatives and Neighbors 

Before the DNA evidence came along, a valuable trick commonly used is to view and investigate the neighbors that lived near known ancestors and relatives. In the old days, many relatives near next door to each. In the 1880 census photo shown toward the right, there is a James A. Curd (1809-1876) and his family living one door from Jane Curd-Lanius. This same James A. Curd is present in the 1870 Census, living a few doors from Jane Curd. 

It turns out that James A Curd was a known slave owner in the Wilson area at that time. In fact, his brother Price Curd (1808-1883), was an even bigger slave owner. I was coming to the conclusion that Price Curd was the father of Jane Curd. James A. Curd could be ruled out because he was born in Virginia, whereas Price Curd was born in Tennessee. In addition, James Curd only has a record of owning two male slaves in the 1840 Census. (At one point - Price Curd owned over 19 slaves in one year)

If you remember from above that Jane Curd's father was noted as being born in Tennessee. Both James and Price Curd had siblings. Their sisters can easily be ruled out as a parent to Jane Curd. The younger brothers of James and Price Curd were either deceased before Jane Curd's birth in 1845 or much too young (1833) to be a parent. This leaves Price Curd as the likely parent of Jane Curd. In fact, let's take a look at the DNA evidence which confirms the connection between the Curd and Handy families.


DNA Evidence linking Curd and Handy Families
AncestryDNA is newest autosomal DNA testing service that's currently on the market. It's owned by Ancestry.com. I submitted a sample of my DNA. AncestryDNA provides matches who are essentially cousins. One of my matches is woman who goes by the username of MidgeEstes. Shown on the left is the DNA match. One of the nicest features with AncestryDNA is that you can link your DNA account to a pedigree tree. 

As you can see, one of the shared common surnames is Curd.


 One of MidgeEstes ancestors was Elizabeth "Betsy" Curd (1738-1821). Elizabeth Curd was the great-grand aunt of Price Curd. Price Curd's great-grand father, John Curd, and Elizabeth Curd, were siblings. This means that their father - Edward Curd (Bet 1650-1670) is the common ancestor to MidgeEstes and myself. 


Pedigree Of Edward Curd 
It appears that Edward Curd was born around 1650 in Scotland. He died in Henrico, Virginia in 1742. The amazing thing about this is the area of autosomal DNA that these DNA tests look at - generally isn't expectant to retain DNA from a 400 year period!!! Each generation you go back, you lose a percentage of DNA due to a natural biological process called - recombination

For MidgeEstes and myself to possess these type of autosomal DNA segments from an ancestor that lived over a 400 year period is amazing.



1860 Slave Census Record of Price Curd
Shown below and toward the right are slave census records of my presumed ancestor - Price Curd (1808-1883). It appears that Price Curd owned many slaves. In the Wilson District Area of Tennessee between the years of 1840-1880s, there were many recorded African-American Curds - whom he and his brother James A Curd are likely the fathers.

In this photo shown toward the right, Price Curd owned 19 slaves alone.




1840 Slave Census Record of Price Curd
Shown toward the left is the 1840 Slave Census record of Price Curd. In this record is likely the mother of Jane Curd (1840 - ?). In this photo, there are two African-American females are at or near age of 23. 

As a side note - Price Curd's In-Laws were the Eatherlys. Price Curd's daughter - Emily Curd, married a James J Eartherly. On the 1882 marriage certificate of Owen Handy and Emma Lanius, there is a John Eatherly whom married and signed the certificate. It's very likely both James and John Eartherly were related.  

As always - it has been a pleasure. Please leave all comments below. 


Marriage Certificate Of Matthew Lanius and Jane Curd

Thanks - Steve Handy

Sunday, November 4, 2012

Understanding Mitochondrial DNA Testing


     Good Day Everyone. In this document, I am going to provide an introduction to the basis of a mitochondrial DNA test. This document should remove any confusion people may have concerning the test. As it stands right now, Family Tree DNA is the premier company that peforms mtDNA testing. The company known as 23andME currently does NOT perform a mtDNA test. 23andME only provides a haplogroup assignment which is an added and extra piece of information to the test. Let's begin with two important and basic principals that DNA tests are built on.


Basic Principals
     The first principal is that when two or more people share or match segments (regions) of DNA, they share a common ancestor in their past. It is from that ancestor that the shared DNA segments are inherited. In this case, the common ancestor was a woman.

   The second principal is that the more DNA you share with someone, the more closer you are to that person. This means your shared common ancestor lived in a more recent time. As we are going to see, this principal is extremely important when considering mtDNA given its slow rate of change.

Now let's look at the mtDNA basics. 

Short Science Part - mtDNA Basics
     The mitochondrian is a structure that sits inside the human cell. It's job is to provide energy to the cell. There are multiple copies of it that lay outside the nucleus. Inside the mitochondrian is a round piece of DNA called mtDNA. mtDNA is circular and has 16,569 DNA base pairs. The mtDNA is composed of three DNA regions - HVR1, HVR2, and CR (Coding Region that has genes). FTDNA has three mtDNA tests based on these three regions.
  1. Low resolution (HVR1) test
  2. High resolution (HVR1 + HVR2) test
  3. Full Genome Sequence test (HVR1 + HVR2 + CR) which looks at the entire mtDNA.
Three important points
  1. Only women pass along their mtDNA to a son and daughter. Men cannot pass along their mtDNA. This means that the inheritance of the mtDNA is child -> mother -> mother's mother -> mother's mother's mother -> etc. In other words, a mtDNA test look at the strict maternal side. 
  2. The word "match" in this context means having an identical mtDNA region (HVR1, HVR2, or CR) as someone else. NOT one base pair should be different. For example, the HVR1 region contains 400 DNA base pairs.  An HVR1 low resolution match means you and someone both share the exact and entire 400 base pairs. A single base mismatch can mean a difference of say 1000 years between you and someone else.
  3. The mtDNA changes very very slowly over time. Because of this, the mtDNA test is mainly used for deep distant ancestry. For example, if you have a HVR1 match, you are very distantly related to that person. In other words - your last common maternal ancestor could have lived over thousands of years ago. The more mtDNA regions you match with someone (there is only 3 regions) - the closer you are related to that person. Ideally and from a practical perspective, you really want to match someone in all three mtDNA regions such as between a mother and daughter. This means your last common maternal ancestor lived recently - say within the last 6 to 8 generations - which is approximately within the last 125 years. 
An mtDNA test also provides a separate piece of information known as a haplogroup. Let's take a look.

Short Science Part - Haplogroups
       A haplogroup is a population of people who are all descendants of a single man or woman who lived in the distant past. In this case - we are talking about mtDNA haplogroups. Each mtDNA haplogroup has a unique set of mtDNA markers that define that haplogroup. Every member of a single haplogroup bears a unique set of mtDNA markers that sets them apart from being a member in a different haplogroup. 

     There are currently 26 known mtDNA haplogroups. All 7 billion humans that currently live on the planet fall into a mtDNA haplogroup. Letters of the alphabet are assigned to a mtDNA haplogroup. An example of a mtDNA haplogroup is L3e. Essentially L3e represents a single woman that lived in the very very distant past. As science studies more populations, more mtDNA haplogroups will be added.

    IMPORTANT: Your haplogroup maternal common ancestor (L3e for example) and your last common maternal ancestor are two completely different women. Let's now look at how to get an estimate of when your last common maternal ancestor lived.

 Statistics
  Unfortunately DNA doesn't have a sign on it that tells you exactly in time when your last common maternal ancestor lived. Because of this, we have to use statistics to get a probability of when your last common maternal ancestor lived. Family Tree DNA currently uses the following accepted criteria to determine a time period.  
  1.   Matching on HVR1 (low resolution match) means that you have a 50% chance of sharing a common maternal ancestor within the last fifty-two generations. That is about 1,300 years.
  2.   Matching on HVR1 and HVR2 (high resolution match) means that you have a 50% chance of sharing a common maternal ancestor within the last twenty-eight generations. That is about 700 years.
  3.   Matching on the Mitochondrial DNA Full Genomic Sequence test (full resolution match) brings your matches into more recent times. It means that you have a 50% chance of sharing a common maternal ancestor within the last 5 generations. That is about 125 years.
      As you can see, these time ranges can be quite large. Remember these are probabilites that are based on an ancestor which could have lived within one of two intervals of a time range. For example, an HVR1 match means that your last common maternal ancestor may have lived within the last 1300 years. This also means that there is still a 50 percent chance that the maternal ancestor could have lived beyond 1300 years ago!!!!

   As you can see, from a practical standpoint, you really want to match someone at the Full Genomic Matching level. In other words, if you take a mtDNA test, you should probably order the FGS test and hopefully match to someone at that level. At the FGS level, your last common maternal ancestor is likely to have lived within the last 5 generations which is a genealogical time frame of about 125 years.

     Well that's it!!!  In short, mtDNA testing involves finding matches that reveal a shared common maternal ancestor. As you can see, the mtDNA changes very slowly which means it's mainly used distant ancestry, but it can be used for recent ancestry as well.

Hope that helps. Please let me know if you have questions. As always, it's a pleasure!!!!

Thanks
Steve Handy

Saturday, November 3, 2012

Understanding Y-DNA Genealogical Testing


Good Day Everyone,

   How is everyone doing? In this document, the Y-DNA genealogical test will be explained. Some people are confused as to exactly what a Y-DNA test is. This document will serve to remove the confusion that surrounds a Y-DNA test. Currently, Family Tree DNA is the premier DNA testing company that performs a Y-DNA genealogical test. This is mainly due to FTDNA's large STR marker system. (I will explain STR's shortly). The company known as 23andME currently doesn't perform a Y-DNA genealogical test. 23andME provides only a Y-DNA haplogroup assignment which is an add on. Let's begin.

Basic Principals
     The first principal is that when two or people share or match regions of DNA, they share a common ancestor in their past. It is from that common ancestor that the shared DNA segments or regions are inherited. In this case, the common ancestor was a male.

     The second principal is that the more DNA you share with someone, the more closer you are to that person. This means your shared common ancestor lived in a more recent time. For example, a brother and sister's last common ancestor is their mother. On the other hand, two first cousin's last common ancestor would be their grandmother. As we are going to see, this principal is extremely important when considering Y-chromosome given its moderate rate of change.

Y-DNA Basics
     So first off - ladies please don't be upset LOL. But a Y-DNA test is strictly for men. Here is why!! Humans have 46 chromosomes. In men, the last chromosome, (46th chromosome) is known as the Y-chromosome. The Y-chromosome is sometimes called the Y-DNA. The Y-DNA has a gene on it called the SRY gene. This master swtich gene (which switches on a bunch of genes) converts a human embryo into a male. Therefore, by definition, only a male has a Y-chromosome (Y-DNA). The Y-DNA has an area of DNA that a Y-DNA genealogical test looks at. This Y-DNA area contains a type of DNA called STRs. STR stands for short tandem repeat. A STR is a repeat of a DNA sequence. I will explain.

     DNA has four bases called A,T,C,G. For example, a DNA sequence would be -> "GCATCATG". The DNA sequence,"CAT", is a STR marker. As you can see, the STR is repeated 2 times (CATCAT). Researchers have a STR naming convention called the DYS system. DYS stands for DNA Y Segment. For example, a common studied DYS marker is DYS393. DYS393 has the STR sequence known as "AGAT". If you see a statement that says "DYS393 = 3", then it means that the DNA sequence, "AGAT", is repeated 3 times like this -> AGATAGATAGAT.

   A Y-DNA genealogical test looks at the DYS markers that currently all modern human men have along their Y-chromosome. Let's take a look. 

Y-DNA Genealogical Test
     A Y-DNA test looks at the DYS markers along the Y-chromosome between any two men. All modern human males have the same set of DYS markers which are situated in the same order along their Y-chromosome. For example, along the Y-chromosome you will see DYS393-DYS390-DYS19 in this order from left to right. The reason for this is that all human men have a common distant paternal ancestor who is known as Y-Chromosome Adam

    A Y-DNA test will look at a set of studied DYS markers and values between any two men. If there are enough matching DYS marker values, then a common paternal ancestor has been revealed between two or more men. This common male ancestor may have lived within a genealogical time frame (last 100 to 200 years). STR markers can change between generations. For example, Male A may have DYS393=10. Male B may have DYS393=12. The difference is 12-10 which is 2. This difference is known genetic distance. Genetic distance is a property of a Y-DNA genealogical test. It's used to get a degree of the relatedness between two or more men.  

     The Y-DNA has a strict inheritance pattern. The pattern is son -> father -> father's father -> father's father's father -> etc. Like the Y-DNA which is passed from father to son, your surname (last name) is typically inherited in a similar fashion. Therefore, a Y-DNA is typically used to see if a group of men who have the same last name, are related. For example, the last name of Williams is fairly common. If you want to know if a group of say, male Williams are related, then a Y-DNA genealogical test would be used. This is commonly used today for adoption, name change, etc. Currently, FTDNA has a panel of 111 DYS markers which makes their Y-DNA test a very popular option.

     Companies such as Family Tree DNA (FTDNA) typically market, package, and sell their Y-DNA tests based on a set number of studied DYS markers. For example - a 12 marker Y-DNA test means a set consisting of 12 popular DYS markers will be analyzed by the DNA testing company. Your specific set of DYS markers and values is known as a Y-DNA haplotype. For example shown below is a picture my personal 12 marker Y-DNA haplotype:

If you click this picture shown toward the left, it will show my set of 12 DYS markers and their values. For example - my DYS393 marker has a value of 15. This means I have a DNA base sequence or STR of "AGAT" that is repeated 15 times along the length of my Y-DNA. 

If another male has the exact set of DYS marker values that I have, then we would be considered a match. This means that me and the other male gentlemen share a paternal common ancestor.  The more DYS markers you share with someone, the more likely you are closely related to that person. I placed the word "likely" in bold, because a Y-DNA test is not always a clear cut test in terms of measuring the relatedness of two or more men. Here is what is meant.

The current thinking is that a male should match to another male on at least 37 DYS markers and above to be considered related within a genealogical time frame (last 100 to 200 years). This is logical and reasonable thinking. However DYS markers can change between generations. 

For example - ideally a father and son, whom are closely related, should match on all known DYS markers. This is true since a son inherits a copy of his father's Y-DNA. However, it's possible even a father and son may differ in DYS marker values. 

To make things more interesting - sometimes the opposite is true. Two or more men can be an exact match on all of their shared DYS marker values and yet be distantly related. There are known cases where two or more men have been an exact match at 111 DYS marker values and yet turned out to be very distantly related (10th cousins). Cases such as this can happen in Y-DNA testing so one should be aware of this.   

While a Y-DNA test is typically used for recent ancestry, a Y-DNA test can be used to reveal deep distant ancestry. This is where haplogroups come into the picture.

Y-DNA Haplogroups
     A Haplogroup is a population of people who are all descendants of a single man or woman who lived in the distant past. In this case - we are talking about Y-DNA haplogroups. Each Y-DNA haplogroup has a unique set of markers that define that haplogroup. Every member of a single haplogroup bears the same unique set of Y-DNA markers which sets them apart from being a member in a different haplogroup. These unique markers arose in a single individual, the haplogroup ancestor, a long time ago. Letters of the alphabet are given to the different Y-DNA haplogroups. A popular Y-DNA haplogroup is E1B1A. Every person, male or female, has a Y-DNA Haplogroup. In essence, a Y-DNA haplogroup, such as E1B1A, represents a single male that lived in the very very distant past!!!  

     The DNA markers used for haplogroup assignment are known as SNPs (pronounced "snip"). A SNP is a DNA base that has changed. For example, suppose a DNA sequence changes from CATG -> CATA. In this case, "G" changed to "A". The base "A" would be considered a SNP. SNP's change very slowly which is why they are used for haplogroup assignment.

    There are approximately 29 known Y-DNA haplogroups. By definition, all modern human men fit into the African Y-DNA Haplogroup known as A. Haplogroup A is then split into the two major Haplogroups, B and CT respectively. From the Y-DNA Haplogroup known as CT, the remaining African and Non-African Y-DNA haplogroups (DE, F, etc) are descended.

    Because of people's different religious, marital, and social practices/histories, certain people tend to be strongly associated with certain haplogroups. For example, the Y-DNA haplogroup known as E1B1A is very strongly associated with African-American males. The Y-DNA known as Q1a3a1 is strongly associated with Native American males.

    It's important to know that your last common paternal ancestor and a haplogroup paternal common ancestor are two different men. Your Y-DNA haplogroup ancestor lived thousands of years ago, whereas your last common paternal ancestor (father or grandfather, etc) lived recently within a genealogical time frame.  All men are related distantly but not all men are related recently.

  Well that's it!!!! As always, it has a pleasure. If anyone has any questions, please feel free to ask.

Thanks
Steve

Understanding BGA Testing


     In this document, I am going to explain BGA Testing. These days there are a number of companies which claim that from your DNA, your ancestry can be determined. DNA stores information such as the color of our eyes and hair. DNA keeps a record of our past ancestors and who we are related to. A person's ethnic composition, religion, language, and name are types of information that is NOT stored nor defined by DNA. From the results of a BGA DNA test, people tend to infer socially defined concepts such as person's religion from DNA. Such inferences can be wrong because DNA doesn't store such information.

Humans tend to categorize things based on observed patterns. Those patterns can not be defined by DNA. So please keep that in mind.

     Now let's please turn our attention the basics of BGA or Admixture Testing. The current position of the scientific community is that the jury is still out on BGA Testing. As we are going to see, there is very good reason for this!!!

BGA Basics And Science
     BGA stands for biogeographical analysis. BGA tests are sometimes call Admixture Tests. A BGA test basically tries to use your DNA to determine or pinpoint what part of the world your ancestor(s) originated. Using your DNA to show if two people have a common ancestor is valid. DNA contains information such as whether or not two people are related.

     However using your DNA to pinpoint where an ancestor was born, lived, or came from, is entirely different.  Here is the idea behind a BGA test.

  Suppose we have a population called the Handy Clan. The Handy Clan has 1000 people and is located on a remote island. Now let's say everyone in the Handy Clan population has a rare DNA marker which we will call -> M. In other words, the frequency of this DNA marker is 100% because everyone (1000 people) has the DNA marker M. Also, let's assume that no one outside of the Handy Clan, which is on this remote island, has the DNA marker M.

Now Laurie lives in the US in Oakland, California which is located outside the remote island and outside of the Handy Clan population. Let's suppose we discover Laurie has this same rare DNA marker M. 

    Can we say Laurie is from or has ancestry from the Handy Clan population?

     Under simple circumstances, yes!!!  We can confidently say that. If no other population in the world has this rare genetic marker M, then we can say yes. Laurie is either from, or has had an ancestor, that originated from the Handy Clan population. That's what a BGA does. It compares your DNA markers to a studied population. Since all one thousand people have the same DNA marker M, then Laurie must either have been born in that Handy Clan population or Laurie had an ancestor from that population.

However reality is not as simple as that!!!!!  Let's see a more realistic scenario.

A More Realistic Scenario 
     Now suppose we have three separate populations, the Handy Clan, Williams Clan, and Henderson Clan. Each population is located in a different part of the world. Each population or clan has 1000 people in it. Every person in each of the populations has the genetic marker M.  In other words, the frequency of the DNA marker M is 100% in each population.

     Now we discover again that Laurie, who lives in Oakland, which is outside each population, has the genetic marker M. 

Question: Does Laurie has ancestry from the Handy Clan population?  

Now things have changed. The question is now harder to answer. The fact that Laurie has a DNA marker M in multiple populations doesn't necessarily mean Laurie has ancestry from the Handy Clan population.  Laurie could of had an ancestor that lived or was born in any of those populations. 

That's the problem with a BGA DNA test. As we can see, the truth is not so clear cut in tests of this nature. The truth is based on a probability.  Any newly introduced population can change things dramatically. Therefore, when interpreting the results from a BGA or Admixture test, please keep in mind that your results may differ or change tomorrow. Laurie would need a paper trail or some definitive piece of evidence to confirm the inference drawn from the BGA results. The BGA data numbers alone don't necessarily prove anything.

The reason is that a BGA test is attempting to infer information from DNA that DNA doesn't define. An ancestor's original location can be any where. DNA simply doesn't reflect or store that type of information. From the frequency (or concentration) of those DNA markers in each population, we are making an inference which could be right or wrong. If a child is born in say Atlanta, Georgia, that geographical location and information will not be stored in the child's DNA. 

  One of the biggest misconceptions out there, is that a BGA or Admixture Test, can pinpoint the exact tribe or small population someone is from. As one can clearly see, this is not necessarily true. DNA alone simply cannot do this as it's advertised. This is one of the reasons, the scientific community as a whole has not embraced BGA tests.

Now let's look at the basic BGA concepts.

BGA Concepts
In BGA terms, the DNA marker M, is called an ancestry informative marker or AIM. Each population is called a reference population. An example of a reference population is the Yoruba. The Yoruba is a West African ethnic group that is studied by population geneticists. Many African-Americans have DNA markers that match to the Yoruba group.  

Now that we have the BGA basics, let's look at the BGA process and engine which is known as PCA.

BGA Process and PCA 
     The engine or workhorse of most BGA Analysis is PCA. PCA stands for Principal Component Analysis. PCA is a complex mathematical process that separates a bunch of data into its components. For example, let's say we have a bag of 100 jelly beans that are of different colors. After separating the jelly beans by color, we see this -> blue=25, red=25, purple=25, and yellow=25. This means that each of the four colors make up 25% (25/100) of the jelly beans. PCA would essentially separate the jelly beans in the exact same way.

     The BGA process starts off with about 300,000 AIMs or SNPs. These SNPs are found across the first 44 chromosomes in humans. The SNPs are matched to a number of reference populations. The results are percentages that represent the concentration of the SNPs in each reference population. The engine running the show is PCA, which runs in the background of an algorithm.

Now let's look at a few BGA tests.

BGA Tests: Population Finder, Ancestry Painting, McDonald
There are a number of BGA tests out there. Family Tree DNA's BGA test is Population Finder. 23andME's is called Ancestry Painting. The Population Finder is a BETA test so it's a work in progress. Population Finder uses continental groups in addition to reference groups.

Here is an example of PF

Continent (Subcontinent)     Population              Percentage    Margin of Error
Europe (Western European)   French, Orcadian       28.53%            ±0.48%
Africa (West African)             Yoruba, Mandenka     71.47%            ±0.48%

There are four reference populations -> French, Orcadian, Yoruba, Mandenka. This person basically has DNA markers that match those reference populations. It's likely this person has ancestry from some of those populations, but not necessarily all of them. A paper trail would be needed to confirm ancestry.

Because the Population Finder is a beta test and has limited reference populations (same for 23andME's Ancestry Painting), many people turn to an Extended BGA Analysis. This is where Dr Doug McDonald comes in.

McDonald's Extended BGA 
Dr Douglass McDonald is a chemist at University of Illinois in Urbana, Illinois. In fact, he actually created the Population Finder for Family Tree DNA. McDonald has access to more studied reference populations which Family Tree DNA or 23andMe currently doesn't have. Because of this, you can get a more "fleshed" out or "extended" BGA Analysis.

McDonald gives his results in the form of an email with four graphs. Here are McDonald's results of my cousin Lonette Lanier's extended BGA test as shown in quotes below:

"LonetteFayLanier216745-autosomal-o-results.csv
Most likely fit is 27.9% (+-  0.1%) Europe (various subcontinents) and 72.1% (+-  0.1%) Africa (all West African).

The following are possible population sets and their fractions, most likely at the top

French= 0.279 Mandenka= 0.721
Hungary= 0.280 Mandenka= 0.720
English= 0.277 Mandenka= 0.723

There is also about 0.4% Native American that is strong and likely real, as well as other little bits on the chromosomes but they are weak and probably unimportant."

Each line, "French= 0.279 Mandenka= 0.721", is a population set. There are three population sets. Each population set gives a likely or probable ancestry for my cousin Lonette. Each population set is a combination that gives the best fit for Lonette's data. It doesn't mean Lonette necessarily has ancestry from say, the French. But she does have DNA markers that match the French reference population. The multiple population sets are the result of Lonette's DNA markers that are spread across multiple populations. This is why it's difficult to pinpoint a person's ancestral origin to a specific tribe or single population via your DNA alone.

It's important to always backup DNA evidence with documents or other pieces of evidence to validate a claim. The numbers alone don't always or neccesarily identify the truth.

Now let's look at the issues the scientific community has with BGA Testing

Issues With BGA Or Admixture Testing
The scientific community as a whole hasn't really embraced BGA or Admixture Testing. Using your DNA to establish whether two or people are related via a common ancestor is valid. However using your DNA to locate where your ancestor(s) originated is quite a different task. An ancestor could have been born or lived in any part of the world. More important - DNA simply doesn't define or contain information such as ancestor's geographical location or point of origin. That type of information is NOT an attribute of a genetic mutation. Therefore BGA or Admixture tests don't have a basis in genetics. That's the scientific community's main objection to BGA or Admixture tests. The results from a BGA or Admixture test are used to make inferences from observed correlations. A correlation can be dangerous in science because it can lead to an incorrect inference from an observed set of data. 

There is a very big difference between a casual relationship (correlation) versus a direct relationship between two variables.

This doesn't mean BGA tests aren't valuable. A BGA test can lead one into finding insight into their past. However you must understand that the results from a BGA test aren't final. The results from a BGA test are tenative and can easily change tomorrow.

There are at least three main current hurdles with a BGA Analysis

1) Populations can change location and identity. They are not static. What we know about a population's history is limited and based on what we currently know. Moderns humans have been here for approximately 200,000 years. No one can know the entire history of any population. We can have approximate knowledge, but NOT complete knowledge.

2) We simply don't at this time have a complete set of reference populations to make any final judgment calls as of yet. (I will explain this shortly)

3) Different algorithms can produce different results.

For example suppose Dr McDonald gives me the following simple BGA results:

Finnish=.100 and Yoruba=.900.

This is based on the fact that the scientific community has studied the Yoruba and Finnish etc. This would lead one to believe that one has a large Yoruba ancestry. The Yoruba ancestry may be true with a paper trail.

Now suppose the scientific community has studied and approved a new reference population, C, in say a few years. Now a rerun of Dr McDonald's results yields the following:

Population C=.450, Finnish=.100, and Yoruba=.450

Now as you can see, things have changed. My ancestor now could have lived in the Yoruba, or could have lived in the new reference population C. This scenario could happen. As you can see, none of these results are absolute or final in the sense that they can't change.

     In addition, different algorithms can produce different results. An algorithm is simply a method or set of steps to solve a problem. The algorithm is very important. It's what produces your DNA results. Right now there are a number of tools out there that claim the ability to produce valid BGA results. Each of these tools may run under different algorithms.

For example - I have taken three BGA tests: Ancestry Painting, Population Finder, and McDonald. Each has produced different results. The analysis from 23andME stated I had 7 percent Asian ancestry. Now this could be significant or it could be noise. Neither FTDNA's Population Finder nor McDonald's findings gave 7% percent ancestry. The bigger question is which one is correct? Population Finder is a BETA test. So I can assume that it's findings are approximate. Can the same be said for 23andME's Ancestry Painting results or Dr McDonald's BGA findings? The truth is that at this time - it's impossible to tell which one is correct or is incorrect.

     The most important point to take from this tutorial is that a BGA can yield valuable information not necessarily definitive information. Technically, the only factual based information that can be produced from a BGA test is that a person has DNA markers (AIMs) that match a reference population. 

Well that's it for BGA Analysis. If anyone has questions, please free to ask.

Thanks
Steve