Wednesday, May 29, 2013

Understanding Correlations and Debunking Misconceptions In DNA Genealogy

     Good Day Everyone - How is everyone doing. Fine I hope. In this tutorial, we are going to get a firm grasp on the basis of genetics as it relates to DNA genealogy. The reason for a return to the basics is to debunk certain misconceptions that seem to be currently floating around in the general public concerning DNA Genealogy.  Debunking a misconception is important. It's important because when a consumer purchases a product, he or she has expectations that the product will fulfill. When those expectations are fueled by misconceptions, the consumer may develop unrealistic expectations that naturally don't get fulfilled. It's sad and unfortunate, but it's a common practice to use a misconception to sell a product. In science, this is a common occurrence and is easily rectifiable with a return to the basics. 

With that in mind - let's begin our discussion.

DNA - The Basics
DNA Genealogy is very popular today. In an age where technology allows a person to send in small DNA sample and get results quickly, many expectations are formed. This is of course logical. If you pay for a product you expect results. However no matter how popular DNA Genealogy is, DNA Genealogy is still based on a science. That science is Genetics. In genetics - the pot of gold is DNA.

DNA stands for deoxyribonucleic acid. DNA sits inside nearly all the cells of a single living thing. DNA carries and transmits biological information from parents to offspring. This is the fundamental principle behind the science of genetics. In fact - in genetics there is an equation called the Central Dogma Of Life

                             DNA -> RNA -> Protein
                             (Central Dogma Of Life)

The above equation is how all life proceeds. Since life has been here on the planet, this is how life, biologically speaking, works. When dealing with genetics, a feature, concept, or any defined entity must somehow fit into the above equation. Technically in genetics, something must be an attribute of a genetic mutation for it have an basis in genetics.

For example, sickle-cell anemia is an inheritable disorder that can be passed along from a parent to its offspring. The reason is that a single mutation in a gene causes this disorder. In other words - sickle-cell anemia is an attribute of a genetic mutation. Sickle-cell anemia is an inheritable trait and it fits into the fundamental principle of genetics.

It's at this point where misconceptions can arise. What are those misconceptions? Let's a took.

Misconceptions In DNA Genealogy
Misconceptions in DNA Genealogy generally stems from a misunderstanding of the basics in genetics. Many of the present misconceptions generally take the form of some defined construct that's NOT inheritable and NOT an attribute of a genetic mutation. The most popular misconceptions in the general public specifically deal concepts such as race, religion, nationality, ethnicity, and geographical point of origin as having some genetic basis. For this tutorial - let's focus on ethnicity. 

Ethnicity is a socially defined category based on common culture or nationality. For example - the term "African-American" - typically refers to a group of people whose ancestors were apart of the West African slave trade during the 1700s. You can expand the definition of ethnicity as it's deemed fit, but no matter how you look at it, ethnicity is socially defined and self determined. Ethnicity is a social construct devised by humans. If a person wants to place themselves in a different ethnic designation over night, then he or she can do that.

However ethnicity is NOT an inheritable trait. In other words - ethnicity is not an attribute of a genetic mutation. Your ethnicity is NOT reflected in your DNA. There is simply no way that it can be reflected in your DNA. Long before humans came along and devised social constructs, life evolved the ability to transmit features from parent to offspring within the DNA molecule, not transmit social designations within the DNA molecule. You can change your ethnic or racial designation, name, location, and religious affiliation. However you can't change your DNA.

Another popular misconception deals with the word "Ancestry". Like so many words, a word can have different meanings in different contexts. In genetics, ancestry means common descent. Ancestry in genetics means when two or more individuals share a unique feature that's derived from a common ancestor. The problem that occurs is when the term ancestry is given a social tone and then used in a scientific and objective arena. 

These misconceptions mentioned above are very popular within the general public. Why the popularity? Well that's where the term correlation comes in. Let's take a look.

Understanding Correlations: Beauty And The Beast
If you have taken statistics before, then you probably have come across the term "correlation". Correlations are like a doubled-edged sword. On one hand, a correlation can be useful. On the other hand, a correlation can dangerous and misleading.

Simply put, a correlation is a casual relationship or association between two or more variables. Correlations are a major reason for the popularity of many of the misconceptions that exist in genetics and thus DNA Genealogy. Many of the so-called BGA, Admixture, Or Ethnic Population tests on the market are based on correlations. That's why those tests are very convincing.

A simple example of a correlation is between time and highway traffic. In many major metropolitan cities across the US, highway traffic tends to occurs at specific times of the day. For example in Chicago, Illinois, Interstate 94 is a major highway system that leads in and out of the downtown area of Chicago. Interstate 94 experiences a consistent and heavy amount of traffic between the times of 6am-9am in the morning and 4pm-7pm in the evening. This happens so regularly at the above times, that the term "rush hour traffic" is used to label the phenomena.

Rush hour traffic is a simple example of a correlation. Here we have a casual relationship or strong association between time (one variable) and traffic (a second variable). Correlations can be useful in certain situations. Let's read and find out why.

Understanding Correlations: The Beauty
A correlation can be useful because it can have strong predictive power in certain circumstances. For example - in our simple example above, if highway traffic consistently occurs at 6am-9am every morning, then one can logically predict that since a highway will experience major traffic, one can avoid it. Many of us subconsciously use correlations on a daily basis to make predictions in order to adjust our behavior accordingly.

However correlations can have a dangerous side as well. Let's see why!!!! 

Understanding Correlations: The Beast
Correlations can be very dangerous and misleading. This is especially true in science. If you have ever heard the term "Correlation Does Not Imply Causation", then you know why correlations can be dangerous. The danger from a correlation is when the casual relationship between variables is perceived of as a direct or cause-effect relationship.

Here is an example of some dangerous logic -> "The heavy rush hour traffic is caused by it being between 4pm-6pm."

Going back to our simple traffic correlation example, if heavy highway consistently occurs at a specific time, then one may actually believe that time actually causes the traffic. This of course is not true. Time does NOT cause the traffic. The traffic is caused by the fact that most people have current work hours that end at a time between 4pm - 6pm. The result is that many people between those hours simply head to the highway which actually causes the congestion and traffic.

This is why correlations can be quite dangerous. A consistently confirmed prediction from a correlation can lead to a false belief that one variable is the result of another variable. When dealing with correlations - what you want is to identify the cause of the casual relationship between the variables. That's the key. It's important to understand that there is a big difference between a casual or associative relationship versus a direct relationship. For example, there is a direct relationship between high blood pressure and salt. There is no correlation between salt and high blood pressure. Salt actually causes high blood pressure. 

Now that we have a solid understanding of correlations, let's turn our attention back to DNA Genealogy

Correlations In DNA Genealogy
If you are wondering if correlations exist in DNA Genealogy - then you are correct. In fact, correlations in genetics is a major reason for the spread and popularity of many of the misconceptions that were mentioned in this tutorial. In genetics, correlations take the form of known and studied DNA markers that are strongly associated with certain defined ethnic populations. This association is actually the basis for many of the so-called Admixture or Ethnic Population Tests such as Docadad Admixture, and for companies such as DNATribes, African-Ancestry, etc to market a product.

A good example of a correlation in DNA Genealogy is between a haplogroup and ethnic population. A haplogroup is a population of people that share a unique set of DNA markers on either the mtDNA or Y-chromosome. For example - the Y-DNA haplogroup known as Q-M3 has a strong association with the ethnic group known as Native Americans. In fact - the association is so strong that it can be used a strong predictor in certain cases.

Another example of a correlation is between an AIM and a geographic region. AIM stands for Ancestry Informative Marker which is basically a DNA marker that's present at a high frequency in a population.  Certain AIMs are strongly associated to certain populations that have a known geographic origin. For instance - the Duffy Null allele is an AIM that has nearly a 100% frequency in Sub-Saharan Africans. AIMs are the basis for BGA tests such as Population Finder or Ancestry Composition. 

With such powerful correlations in genetics can someone's ethnicity, race, religion, or geographical point of origin be determined from their respective genetics?

Dangers Of Correlations in DNA Genealogy
The answer to the previous question that was asked two sentences above is a simple no. A correlation may seem very persuasive but it's nevertheless still a correlation. No matter how strong a correlation is - a casual relationship is NOT a direct relationship. Simply put your ethnicity, race, religion, or etc is NOT a product of your genetics. In genetics, the golden rule is that a defined entity must be an attribute of a genetic mutation. If the golden rule is not there, then it doesn't hold water in genetics. 

It's understandable why it can be hard for someone to separate their ethnicity or any social construct from their respective genetics. A correlation can generate an illusion of a direct relationship when there actually is NOT such a direct relationship. The situation is made worse when you have various companies advertising such fallacies. For example, the term "genetic ethnicity" is used by certain organizations in order to sell a product. However, a solution in dealing with a correlation is step back and understand the reason for the casual or associative relationship

In this case, why is there an strong association between certain genetic markers and certain ethnic populations?

The reason has to do with an ethnic population's martial and reproductive patterns not their genetics. Let's assume a unique genetic marker arises in a population via a mutation. If the members of that population reproduce with only members of the same population for an extended amount of time, then an associative relationship will form. This is how a genetic marker can become associated with a population. The result is that a correlation will form. This is especially true if the population retains a small size over time.  An example of this is with the ethnic group known as Native Americans. The Y-DNA haplogroup known as Q-M3 has a high frequency and strong association among Native American males. This is due to the strict, martial practices displayed over a long period of time. Many Native American males mated with only Native American women over time and simply never deviated from that practice.

Another way to expose a correlation is at the prediction level. For example - the Y-DNA haplogroup known as E1B1A has a strong association and frequency among African-American males. Going on correlation logic - a male who identifies himself as African-American should possess the E1B1A haplogroup. My 2nd cousin is Lewis Lamar. His ethnic designation is African-American and yet his Y-DNA haplogroup is R1b1a. 

With that we will end our discussion on correlations and misconceptions in DNA Genealogy. I hope this tutorial has shed some light on the dangers of correlations in certain circumstances as well demystifying some prevalent misconceptions. So the next time you purchase a genetic ethnicity test, tell them you want your money back LOL!!!!!!!!!!

Take Care
Steve Handy