eupator
destroyer of delusions
- Messages
- 509
- Reaction score
- 284
- Points
- 63
- Ethnic group
- Rhōmaiōs (Rumelia + Anatolia)
Download and install R-Studio for Windows. https://www.rstudio.com/products/rstudio/
Download and install R-tools 4.2 for Windows. https://cran.r-project.org/bin/windows/Rtools/
After you complete the installation, run it (R-Studio) and install admixtools2 and dependencies by following the instructions on the INSTALLATION part of this page: admixtools2 on github.
https://uqrmaie1.github.io/admixtools/index.html
Finally, go to the Reich dataset page and download the sample files. https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/
1240+HO dataset has manier moderns but less SNPs (500K), compared to 1240K. You can find the description of each sample in the .anno files. You can rename the samples (for example I renamed Kotias to CHG, Iran_Ganj_Dareh to Iran_N, etc) in the .ind file for your convenience w/o changing the order/position of each sample, but do not mix together .DG and .SG files to non .DG/.SG ones, the former are shotgun sequences.
In the command prompt of your R-Studio, you need the following commands (each time you run the program from scratch):
prefix = "C:/Users/eupator/Downloads/eupator_qpAdm_files/HO/v50.0_1240K_public"
my_f2_dir = "C:/Users/eupator/Downloads/eupator_qpAdm_files/my_f2_dir_eupator"
library(admixtools)
library(tidyverse)
The prefix needs to be the path linking to the .geno file that you are using 1240K/or HO (and the name you've given it, if you have) w/o the .geno extension at the end. Be careful and specific, a lot of people get temp stuck here.
The my_f2_dir needs to be a directory where the generated f2 stats are going to be stored each time, I named it my_f2_dir_eupator.
Now you are ready to do some runs, you require
a) a left list, which will be comprised of your target on top and its components below (4-5).
b) a right list, which is going to be your source populations (before the split) that the target is going to be compared against, there is an ongoing debate of how the right list should be built, some suggest using a few very distant populations, other insist on using recent source populations, you can use either methods or both (I use a mix). Some people suggest using no more than 15 pops, you can probably use up to 30.
Example:
I want to run the Lazaridis' modern Greek references, as a 3-way component model, in order to check the value of their ancient Greek (IA) part (empuries2), their Slavic part (I will use Polish.DG for this reason) and their West Asian part (I will use Armenian.DG for this reason).
I renamed all 3 empuries_2 samples (I8215, I8208, I8205) to "Greek_Emporion".
First, I will set my target and left with the following commands:
target = c('Greek')
left= c('Greek_Emporion','Polish.DG','Armenian.DG')
And for my right list, I use a modified (with more recents) Lazaridis et al. (2017) right list as a prototype (Mbuti.DG always on top):
right = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG')
Before I run the model, I need to generate the f2 stats for the total of both my left, right and target.
mypops = c('Mbuti.DG', 'Ethiopia_4500BP_published.SG', 'Russia_Ust_Ishim.DG', 'Czech_Vestonice16', 'Belgium_UP_GoyetQ116_1_published', 'Russia_Kostenki14.SG', 'Russia_AfontovaGora3', 'Italy_North_Villabruna_HG', 'Han.DG', 'Papuan.DG', 'Karitiana.DG', 'Georgia_Satsurblia.SG', 'Iran_GanjDareh_N', 'Turkey_Epipaleolithic', 'Morocco_Iberomaurusian', 'Jordan_PPNB', 'Russia_HG_Karelia.SG', 'Russia_Samara_EBA_Yamnaya', 'Czech_Bohemia_CordedWare', 'Armenia_LBA.SG', 'ONG.SG','Greek','Greek_Emporion','Polish.DG','Armenian.DG')
extract_f2(prefix, my_f2_dir, pops = mypops, overwrite = TRUE, maxmiss = 1)
f2_blocks = f2_from_precomp(my_f2_dir, pops = mypops, afprod = TRUE)
Now I can run the model using the following commands:
results = qpadm(prefix, left, right, target, allsnps = TRUE)
results$weights
results$popdrop
The model is a success. It has a good p-value (above 5%) of 0.0892 and low std. errors (around or below 5%).
> results$weights
# A tibble: 3 × 5
target left weight se z
1 Greek Greek_Emporion 0.416 0.0612 6.79
2 Greek Polish.DG 0.371 0.0391 9.48
3 Greek Armenian.DG 0.213 0.0458 4.66
> results$popdrop
# A tibble: 7 × 14
pat wt dof chisq p f4rank Greek_Emporion Polish.DG
1 000 0 18 26.5 8.92e- 2 2 0.416 0.371
2 001 1 19 48.6 2.07e- 4 1 0.630 0.370
3 010 1 19 93.0 9.83e-12 1 1.01 NA
4 100 1 19 76.6 7.17e- 9 1 NA 0.598
5 011 2 20 95.8 7.08e-12 0 1 NA
6 101 2 20 131. 2.74e-18 0 NA 1
7 110 2 20 313. 2.00e-54 0 NA NA
# … with 6 more variables: Armenian.DG , feasible ,
# best , dofdiff , chisqdiff , p_nested
The Greek reference can be successfully modeled as 41.6% Greek Emporion, 37.1% Polish.DG and 21.3% Armenian.DG.
Last edited: