Admixtools PCA with PLINK & ChatGPT (Python Samurai)

Jovialis

Advisor
Messages
9,313
Reaction score
5,878
Points
113
Ethnic group
Italian
Y-DNA haplogroup
R-PF7566 (R-Y227216)
mtDNA haplogroup
H6a1b7
rjjUyUs.png


I have been experimenting creating PCAs using PLINK and ChatGPT's ability to utilize Python script. I use a dedicated GPT called Python Samurai, but the regular GPT 4.0 can do it as well. I wanted to exactly replicate the PCAs from studies, however, most of them use smartpca with Eigensoft, which is done in Eigenstrat format. I took West Eurasian samples, modern and ancient, but it doesn't project in a very visually discernable way. Neverthless, if you ask the AI to zoom in on specific samples, it looks a lot better. In fact, it made me aware that "Italian_North.HO" includes Tuscan samples in it. I would like to eventually convert my merged AADR dataset to eigenstrat and try it out in smart PCA. But in the meantime, this has been very interesting to experiment with. Also, since ChatGPT include Python functionality, it is very easy to just tell it what you want in plain english, than creating many specifics in code for the Ubuntu terminal.

This is the script I used to generate the necessary files I upload to the AI:
Code:
plink --allow-extra-chr --allow-no-sex --bfile /mnt/d/Bioinformatics/01_Admixtools_Dataset/v54.1.p1_HO_Jovialis_Plink/PCA/v54.1.p1_HO_Jovialis --keep /mnt/d/Bioinformatics/01_Admixtools_Dataset/v54.1.p1_HO_Jovialis_Plink/PCA/aDNA_Modern.FAM --out output_file_name --pca --set-hh-missing

In ChatGPT, just copy and paste the file paths of your files, and tell it to modify the code for you to reflect it in the prompt.

Be sure to include these flags. (they're in the code above)

--allow-extra-chr --allow-no-sex --bfile --keep --out --pca --set-hh-missing

But before you do that, choose the samples you want to project. Unless you have a super computer with a ton of ram, I wouldn't try converting the entire FAM. Instead you can copy and paste the ones you want to a new FAM, and include that in the file path, along with the BIM and BED files.

The PCA at the top uses PC -2 and PC -1, which looks the way it does in the vast majority of studies.
 

This thread has been viewed 552 times.

Back
Top