For quite some months I've spent countless hours experimenting with G25 to see how well it can replicate admixture results from formal studies that utilise qpAdm and ADMIXTURE etc. I agree that G25 can be hit and miss. The problem is there is no way to validate the results except by having pre-existing knowledge.Check out this thread:
G25 - g25 VS qpadm/admixtools2 comparison.
Had some time to kill so here goes. These are my best ‘mixed mode’ results for each period for G25 Illustrative DNA. The raw dna file that was used to produce these results was an extract from my Whole Genome Sequence Nebula Genomics file, that has 99.9% SNP coverage with eurogenes’ template...www.eupedia.com
In addition to @eupator there's a few other members that could help you out.
There's times when it does replicate qpAdm, and times when it doesn't. Frankly, you should endeavor to use admixtools for optimal analysis. It is the suite of tools used by academics for peer-reviewed studies. Once you get it set up, it becomes second nature. I use AI to help me figure it out.
With eigenstrat you can take advantage of other tools, notably smartpca. Which is the same PCA that is used in many studies.For quite some months I've spent countless hours experimenting with G25 to see how well it can replicate admixture results from formal studies that utilise qpAdm and ADMIXTURE etc. I agree that G25 can be hit and miss. The problem is there is no way to validate the results except by having pre-existing knowledge.
Anyway I've decided to learn ADMIXTURE for the time being and once I've got the hang of it I might move on to learn qpAdm. BTW I've noticed in this thread there's a few mentions of converting Eigenstrat format files to Plink format. Is there any benefit of this since qpAdm supports Eigenstrat files, doesn't it?
Thanks! Yeah I've downloaded the Reich dataset (the HO one) which of course is in Eigenstrat. Since ADMIXTURE also supports Eigenstrat format then I don't see much of a reason to convert the dataset to Plink. I just need to learn to create subsets of the Reich dataset. I notice however that the ADMIXTURE tutorials I've found tend to use Plink files – possibly because they were using an older version of ADMIXTURE which didn't support Eigenstrat, and/or because some of the operations in the tutorials would utilise Plink.With eigenstrat you can take advantage of other tools, notably smartpca. Which is the same PCA that is used in many studies.
However, I myself have tried converting back from PLINK to see where I plot. But I run into an issue where the file becomes massive for some reason. I haven't been able to figure it out, and I've asked around, but haven't found the answer. Like yourself, I am self-taught, and still learning.
Welcome to the forum btw!
There is easy way how to avoid this.With eigenstrat you can take advantage of other tools, notably smartpca. Which is the same PCA that is used in many studies.
However, I myself have tried converting back from PLINK to see where I plot. But I run into an issue where the file becomes massive for some reason. I haven't been able to figure it out, and I've asked around, but haven't found the answer. Like yourself, I am self-taught, and still learning.
Welcome to the forum btw!
Thanks! I did this when merging previously.There is easy way how to avoid this.
You have to filter the samples being merged to the large dataset to only those SNPs already existing in the large dataset, else the extraneous SNPs in the sample that don't occur in the large forces the large to add No Calls to every sample in it swelling the dataset.
#
# This example would gather the SNPs in the primary and write them out. Then you use it to filter the sample, in this case my sample.
#
# You'd then merge this new filtered sample dataset with the primary like normal.
# plink --allow-no-sex --bfile v52.2_1240K_public --write-snplist --out v52.2_1240K_clean
# plink --23file PLg.txt --extract v52.2_1240K_clean.snplist --make-bed --out PLg_v54p1_genome
getwd()
# system("plink --allow-no-sex --bfile v52.2_1240K_public --write-snplist --out v52.2_1240K_clean ")
system("plink --bfile S2949 --extract v52.2_1240K_clean.snplist --make-bed --out S2949_filtered")
system("plink --bfile S2949_filtered --bmerge v52.2_1240K_public.bed v52.2_1240K_public.bim v52.2_1240K_public.fam --out Hut")
--write-snplist - is the option to create clean snip list . Only the snips in this list will be used for the merge. This are the snips in the large dataset. Because the new file that you will merge may have some other snips or the names for the snips may be different which is also causing some issues. I noticed that some files may have different names for the snips, depending on the format.
I think I have seen that also.Thanks! I did this when merging previously.
What I meant is after it is already in PLINK format the size is good, and comparable to the previous eigenstrat format; converting back to eigenstrat from PLINK it becomes like 10x bigger than the original eigenstrat file, despite only having 1 sample added (with no extra SNPs, just the ones native to the original eigenstrat.)
At any rate, thanks for that information, because it will be useful when i re-merge my sample when the new updates for AADR come out.
For mobile you are limited to pca-based admixture calculators and oracles such as G25. To run anything like ADMIXTURE or qpAdm which are academic standard you need a Linux/Unix based environment on a desktop or laptop.What's the most accurate thing for mobile users? I use g25, but I'm reading its not reliable at all.
Hi Jovialis,I currently have the 1240K+HO version which I converted to PLINK format.
If you want to add your own DNA, you first have to make sure the raw data is in 23andme format, and aligned to HG19 format. That is the format that AADR uses.
You then have to use PLINK to convert the AADR files from eigenstrat format to PLINK format. Then convert your 23andme raw data to PLINK format, and do a one sided merge in PLINK, making sure to only include SNPs native to AADR.
It is a bit of a complex process, I used Chatgpt 4.0 to help me out.
plink --23file eupator.txt --make-bed --out eupator
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator
plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2
plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final
Thanks. I'd previously only seen the examples in the online PLINK 1.9 instructions. Yours was a bit different in that you did one flip and one exclude. I had been wondering what the benefit of merging in PLINK is when MERGEIT automatically checks and fixes strand incompatibilities during the merge without having to remedy the problem with more commands. Perhaps it's to do with what the chart below says. PLINK doesn't shed the amount of SNPs that MERGEIT does. I might experiment and compare.This is is how you merge with PLINK:
My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.
Code:plink --23file eupator.txt --make-bed --out eupator
That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.
Then use your converted .bed dataset to merge. In this example it's v54_HO_public.
Code:plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator
The merged file is 54_HO_public_merged.
If you get strand inconsistency problems you can solve them, as they appear, with the following:
Code:plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2 plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final
Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.
BTW do you know how to pseudo-haploidize the data? (preferably in PLINK or EIGENSTRAT format)This is is how you merge with PLINK:
My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.
Code:plink --23file eupator.txt --make-bed --out eupator
That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.
Then use your converted .bed dataset to merge. In this example it's v54_HO_public.
Code:plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator
The merged file is 54_HO_public_merged.
If you get strand inconsistency problems you can solve them, as they appear, with the following:
Code:plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2 plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final
Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.
No, sorry.BTW do you know how to pseudo-haploidize the data? (preferably in PLINK or EIGENSTRAT format)
No probs – I just found out in the meantime that the majority of ancient samples in the AADR data set have already been pseudo-haplodized. I even double checked by running a .freqx report on a subset of ancient samples in PLINK and they are all homozygous calls.No, sorry.
Yesssss! Thanks for thatThe Allen Ancient DNA Resource (AADR): A curated compendium of ancient human genomes
The Allen Ancient DNA Resource (AADR) seeks to provide a publicly available, uniformly curated dataset that is maximally useful for scientists carr...dataverse.harvard.edu
Database has been updated.