Admixtools admixtools2 TUTORIAL for WINDOWS.

** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
Note: break used in wrong context: no loop is visible
** help
*** installing help indices
*** copying figures
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (admixtools)

Got it to work in the R Console. Within Rstudio itself I was getting errors
 
Check out this thread:


In addition to @eupator there's a few other members that could help you out.

There's times when it does replicate qpAdm, and times when it doesn't. Frankly, you should endeavor to use admixtools for optimal analysis. It is the suite of tools used by academics for peer-reviewed studies. Once you get it set up, it becomes second nature. I use AI to help me figure it out.
For quite some months I've spent countless hours experimenting with G25 to see how well it can replicate admixture results from formal studies that utilise qpAdm and ADMIXTURE etc. I agree that G25 can be hit and miss. The problem is there is no way to validate the results except by having pre-existing knowledge.

Anyway I've decided to learn ADMIXTURE for the time being and once I've got the hang of it I might move on to learn qpAdm. BTW I've noticed in this thread there's a few mentions of converting Eigenstrat format files to Plink format. Is there any benefit of this since qpAdm supports Eigenstrat files, doesn't it?
 
For quite some months I've spent countless hours experimenting with G25 to see how well it can replicate admixture results from formal studies that utilise qpAdm and ADMIXTURE etc. I agree that G25 can be hit and miss. The problem is there is no way to validate the results except by having pre-existing knowledge.

Anyway I've decided to learn ADMIXTURE for the time being and once I've got the hang of it I might move on to learn qpAdm. BTW I've noticed in this thread there's a few mentions of converting Eigenstrat format files to Plink format. Is there any benefit of this since qpAdm supports Eigenstrat files, doesn't it?
With eigenstrat you can take advantage of other tools, notably smartpca. Which is the same PCA that is used in many studies.

However, I myself have tried converting back from PLINK to see where I plot. But I run into an issue where the file becomes massive for some reason. I haven't been able to figure it out, and I've asked around, but haven't found the answer. Like yourself, I am self-taught, and still learning.

Welcome to the forum btw!
 
With eigenstrat you can take advantage of other tools, notably smartpca. Which is the same PCA that is used in many studies.

However, I myself have tried converting back from PLINK to see where I plot. But I run into an issue where the file becomes massive for some reason. I haven't been able to figure it out, and I've asked around, but haven't found the answer. Like yourself, I am self-taught, and still learning.

Welcome to the forum btw!
Thanks! Yeah I've downloaded the Reich dataset (the HO one) which of course is in Eigenstrat. Since ADMIXTURE also supports Eigenstrat format then I don't see much of a reason to convert the dataset to Plink. I just need to learn to create subsets of the Reich dataset. I notice however that the ADMIXTURE tutorials I've found tend to use Plink files – possibly because they were using an older version of ADMIXTURE which didn't support Eigenstrat, and/or because some of the operations in the tutorials would utilise Plink.

Btw when you converted back from the Plink file, did you convert to Eigenstrat or VCF? (I read that VCF files are quite larger than the other formats).
 
With eigenstrat you can take advantage of other tools, notably smartpca. Which is the same PCA that is used in many studies.

However, I myself have tried converting back from PLINK to see where I plot. But I run into an issue where the file becomes massive for some reason. I haven't been able to figure it out, and I've asked around, but haven't found the answer. Like yourself, I am self-taught, and still learning.

Welcome to the forum btw!
There is easy way how to avoid this.

You have to filter the samples being merged to the large dataset to only those SNPs already existing in the large dataset, else the extraneous SNPs in the sample that don't occur in the large forces the large to add No Calls to every sample in it swelling the dataset.
#
# This example would gather the SNPs in the primary and write them out. Then you use it to filter the sample, in this case my sample.
#
# You'd then merge this new filtered sample dataset with the primary like normal.


# plink --allow-no-sex --bfile v52.2_1240K_public --write-snplist --out v52.2_1240K_clean
# plink --23file PLg.txt --extract v52.2_1240K_clean.snplist --make-bed --out PLg_v54p1_genome
getwd()
# system("plink --allow-no-sex --bfile v52.2_1240K_public --write-snplist --out v52.2_1240K_clean ")
system("plink --bfile S2949 --extract v52.2_1240K_clean.snplist --make-bed --out S2949_filtered")

system("plink --bfile S2949_filtered --bmerge v52.2_1240K_public.bed v52.2_1240K_public.bim v52.2_1240K_public.fam --out HO_out")


--write-snplist - is the option to create clean snip list . Only the snips in this list will be used for the merge. This are the snips in the large dataset. Because the new file that you will merge may have some other snips or the names for the snips may be different which is also causing some issues. I noticed that some files may have different names for the snips, depending on the format.
 
There is easy way how to avoid this.

You have to filter the samples being merged to the large dataset to only those SNPs already existing in the large dataset, else the extraneous SNPs in the sample that don't occur in the large forces the large to add No Calls to every sample in it swelling the dataset.
#
# This example would gather the SNPs in the primary and write them out. Then you use it to filter the sample, in this case my sample.
#
# You'd then merge this new filtered sample dataset with the primary like normal.


# plink --allow-no-sex --bfile v52.2_1240K_public --write-snplist --out v52.2_1240K_clean
# plink --23file PLg.txt --extract v52.2_1240K_clean.snplist --make-bed --out PLg_v54p1_genome
getwd()
# system("plink --allow-no-sex --bfile v52.2_1240K_public --write-snplist --out v52.2_1240K_clean ")
system("plink --bfile S2949 --extract v52.2_1240K_clean.snplist --make-bed --out S2949_filtered")

system("plink --bfile S2949_filtered --bmerge v52.2_1240K_public.bed v52.2_1240K_public.bim v52.2_1240K_public.fam --out Ho_Out")


--write-snplist - is the option to create clean snip list . Only the snips in this list will be used for the merge. This are the snips in the large dataset. Because the new file that you will merge may have some other snips or the names for the snips may be different which is also causing some issues. I noticed that some files may have different names for the snips, depending on the format.
Thanks! I did this when merging previously.

What I meant is after it is already in PLINK format the size is good, and comparable to the previous eigenstrat format; converting back to eigenstrat from PLINK it becomes like 10x bigger than the original eigenstrat file, despite only having 1 sample added (with no extra SNPs, just the ones native to the original eigenstrat.)

At any rate, thanks for that information, because it will be useful when i re-merge my sample when the new updates for AADR come out.
 
Thanks! I did this when merging previously.

What I meant is after it is already in PLINK format the size is good, and comparable to the previous eigenstrat format; converting back to eigenstrat from PLINK it becomes like 10x bigger than the original eigenstrat file, despite only having 1 sample added (with no extra SNPs, just the ones native to the original eigenstrat.)

At any rate, thanks for that information, because it will be useful when i re-merge my sample when the new updates for AADR come out.
I think I have seen that also.
 
What's the most accurate thing for mobile users? I use g25, but I'm reading its not reliable at all.
For mobile you are limited to pca-based admixture calculators and oracles such as G25. To run anything like ADMIXTURE or qpAdm which are academic standard you need a Linux/Unix based environment on a desktop or laptop.
 
I currently have the 1240K+HO version which I converted to PLINK format.

If you want to add your own DNA, you first have to make sure the raw data is in 23andme format, and aligned to HG19 format. That is the format that AADR uses.

You then have to use PLINK to convert the AADR files from eigenstrat format to PLINK format. Then convert your 23andme raw data to PLINK format, and do a one sided merge in PLINK, making sure to only include SNPs native to AADR.

It is a bit of a complex process, I used Chatgpt 4.0 to help me out.
Hi Jovialis,
What do you mean by a one-sided merge in Plink? Also is there an issue with merging snps that aren't already in the AADR dataset?

So far I've avoided Plink for merging because it always fails and I'd prefer not to go through the convoluted process to deal with the strand-flipping. I merge the datasets in eigenstrat format using mergeit as it automatically sorts out those issues during the merge. I then convert back to Plink for QC filtering and LD pruning etc. But if there is a particular benefit in merging in Plink that is not mergeit I might try the Plink way, however complicated.
 
Last edited:
This is is how you merge with PLINK:

My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.

Code:
plink --23file eupator.txt --make-bed --out eupator

That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.

Then use your converted .bed dataset to merge. In this example it's v54_HO_public.

Code:
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator

The merged file is 54_HO_public_merged.

If you get strand inconsistency problems you can solve them, as they appear, with the following:

Code:
plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2

plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final

Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.
 
Last edited:
This is is how you merge with PLINK:

My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.

Code:
plink --23file eupator.txt --make-bed --out eupator

That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.

Then use your converted .bed dataset to merge. In this example it's v54_HO_public.

Code:
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator

The merged file is 54_HO_public_merged.

If you get strand inconsistency problems you can solve them, as they appear, with the following:

Code:
plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2

plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final

Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.
Thanks. I'd previously only seen the examples in the online PLINK 1.9 instructions. Yours was a bit different in that you did one flip and one exclude. I had been wondering what the benefit of merging in PLINK is when MERGEIT automatically checks and fixes strand incompatibilities during the merge without having to remedy the problem with more commands. Perhaps it's to do with what the chart below says. PLINK doesn't shed the amount of SNPs that MERGEIT does. I might experiment and compare.
 

Attachments

  • 126028934-333c7083-3ac2-48a1-bf52-b354cfaaf1a0.png
    126028934-333c7083-3ac2-48a1-bf52-b354cfaaf1a0.png
    85.8 KB · Views: 20
This is is how you merge with PLINK:

My raw dna file in this example is eupator.txt in the 23ame format, if it's not in that format, convert it with dna kit studio or similar.

Code:
plink --23file eupator.txt --make-bed --out eupator

That will create the bed filed eupator.bed. Use an editor to rename your sample in the file to your liking.

Then use your converted .bed dataset to merge. In this example it's v54_HO_public.

Code:
plink --allow-no-sex --bfile v54_HO_public --bmerge eupator --out v54_HO_public_eupator

The merged file is 54_HO_public_merged.

If you get strand inconsistency problems you can solve them, as they appear, with the following:

Code:
plink --allow-no-sex --bfile eupator --flip v54_HO_public_eupator-merge.missnp --make-bed --out eupator_flipped

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_flipped --make-bed --out v54_HO_public_eupator2

plink --allow-no-sex --bfile eupator_flipped --exclude v54_HO_public_eupator2-merge.missnp --make-bed --out eupator_filtered

plink --allow-no-sex --bfile v54_HO_public --bmerge eupator_filtered --make-bed --out v54_HO_public_final

Your new merged dataset is v54_HO_public_final, you can use it straight away with at2 as at2 reads plink files.
BTW do you know how to pseudo-haploidize the data? (preferably in PLINK or EIGENSTRAT format)
 
No, sorry.
No probs – I just found out in the meantime that the majority of ancient samples in the AADR data set have already been pseudo-haplodized. I even double checked by running a .freqx report on a subset of ancient samples in PLINK and they are all homozygous calls.
 
Back
Top