• Don't want to see ads? Install an adblocker like uBlock Origin or use a Europe-based privacy-friendly browser like Vivaldi or Mullvad.

North Italians & Iberians closer to C6 than Latini

Jovialis

Advisor
Messages
9,888
Reaction score
6,794
Points
113
Ethnic group
Italian
Y-DNA haplogroup
R1b-PF7566>Y227216
mtDNA haplogroup
H6a1b7
Comparison of Genetic Distances of C6 Central Mediterranean and Latini vs Moderns. By FST point estimates, nearly all tested modern populations are closer to C6 than to Latini, including Northern Italians. The two runs retained slightly different SNP sets (~460k vs ~534k after filtering), but this difference should have minimal practical impact on the overall pattern; it mainly affects only borderline cases, where the distances are extremely close.

FST runs with Admixtools2 the graphics were made in Rstudio

Latini in this run are R851, R1016, and R1021 which are the actual specific Latini listed in Antonio et al. 2019.

1767623616877.png
1767623628674.png
 
Can you please run FST distances to MBG004, MBG016, HOC004 used as a single group(exclude the _d version of each sample) and also Broion_BA.SG?
I suggest also to remove from Italian_North.HO, before doing the calcs, ALP277.HO(Domodossola, highly germanic admixed likely from Walser people) ALP481.HO( Sappada Bavarian), HGDP01161.HO, HGDP01162.HO, HGDP01163.HO, HGDP01164.HO, HGDP01166.HO, HGDP01167.HO, HGDP01168.HO, HGDP01169.HO (Tuscans) and maybe also the Bergamo HGDP samples or most of them(HGDP01147.HO, HGDP01151.HO,HGDP01152.HO, HGDP01153.HO, HGDP01155.HO, HGDP01156.HO, HGDP01157.HO, HGDP01171.HO, HGDP01172.HO, HGDP01173.HO, HGDP01174.HO, HGDP01177.HO)
Otherwise you will not get a good representative of the average, you can see on a PCA that they form separate clusters (surely the Tuscans, and the majority of the Bergamo samples)
 
Last edited:
TuscanHgdp are from the province of Grosseto if i'm not wrong, database lacks a proper North Western Tuscan sample to have a clear idea, let alone a Tuscan\Emilian Appenines sample wich would be even closer to North Italy average if we exclude Alps and Pre-Alps.
It is like considering Romagnoli as a proxy for whole Emilia i guess, with the difference that Romagnoli plot a bit more Eastern and southward compared to Tuscans .
 
Last edited:
So, are Northern Italians the closest modern population to IA Latins? Have you considered other modern populations further to those in your graph?
 
Otherwise you will not get a good representative of the average, you can see on a PCA that they form separate clusters (surely the Tuscans, and the majority of the Bergamo samples)
The Bergamese are prototypical Northern Italians, it would be a bit weird in my opinion to exclude them from a Northern Italian average.
 
The Bergamese are prototypical Northern Italians, it would be a bit weird in my opinion to exclude them from a Northern Italian average.
No, you can clearly see on a PCA that they form a separate cluster, west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman. If the goal is to evaluate FST distances to the average of present day populations it does not make sense to use more than half samples outside of the main cluster. Also Greek.HO is really messy and it would be more meningful to use the Athens or Thessaloniki subset. English.HO is a mix of Cornish and English from Kent.
 
No, you can clearly see on a PCA that they form a separate cluster, west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman.
We agree. Accordingly, they are as “purely” Northern Italian as one can be. The fact that being quintessentially Northern Italian would nonetheless pull them away from a supposed “Northern Italian average” is, indeed, a paradox. If resolving that paradox were to require the exclusion of the Bergamaschi, the resulting outcome would in any event have to be assessed with a very significant caveat (at least in my view): namely, that the “Northern Italian average” would have been established without taking into account one of the modern populations that is most authentically Northern Italian.
 
Last edited:
west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman.
This explains the Bergamo cluster well.
 
We agree. Accordingly, they are as “purely” Northern Italian as one can be. The fact that being quintessentially Northern Italian would nonetheless pull them away from a supposed “Northern Italian average” is, indeed, a paradox. If resolving that paradox were to require the exclusion of the Bergamaschi, the resulting outcome would in any event have to be assessed with a very significant caveat (at least in my view): namely, that the “Northern Italian average” would have been established without taking into account one of the modern populations that is most authentically Northern Italian.
No, you can clearly see on a PCA that they form a separate cluster, west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman. If the goal is to evaluate FST distances to the average of present day populations it does not make sense to use more than half samples outside of the main cluster. Also Greek.HO is really messy and it would be more meningful to use the Athens or Thessaloniki subset. English.HO is a mix of Cornish and English from Kent.

I think it's incorrect to exclude them but more correct to use weighted averages relative to Bergamo's numeric representation in northern Italy. I would also advise that it's very possible you may get results which are far less EEF heavy depending on what village or town the samples come from. It's also rather presumptious to assume the average is further east due to Germanic admixture rather than just a normal distribution of steppe ancestry that was already prior local to Italy.
 
In this PCA (built with smartpca), Bergamo does indeed appear slightly more shifted toward Etruscan-related populations. The reference population panel follows the standard West Eurasian set used in Lazaridis et al. 2014.

Obs: previous image link expired.


533772048-16faa4d5-8ff4-4ed8-b501-c0308d74297a.png
image.png
 
Last edited:
Always good to see people using the academic tools.

PCA-based distances and FST Allele frequency can be a bit different.

At any rate, here's another population that demonstrates more of the same pattern:

1768239810439.png


Roman Bivio (Umbria) is indistinguishable from C6, which essentially means it is a C6 population.

1768239869197.png
 
Yes, the use of academic tools is important. I agree that PCA-based distances and FST allele frequency can be a bit different. The PCA I showed earlier was more to demonstrate the position of Bergamo, rather than to compare with the FST analysis. My main concern with this FST analysis to conclude that Northern Italians are closer to C6 than to Latini is that the confidence interval of the FST between Latini and North Italians is too high, so I think this cannot be said with certainty.

However, an interesting point is that in my PCA North Italians drift towards lower values in the PC2 axis in comparison to IA Central Italians, just like the Italy Imperial samples (which also have lower PC2 values). This could reflect complex patterns of shared ancestry that also lowers the FST between North Italians and Central Italy Imperial samples. I am aware that, as you said, PCA analysis is not directly comparable to FST, but both can capture the same population movements, so sometimes the pattern observed in both analyses can be the same.
 
Can you please run FST distances to MBG004, MBG016, HOC004 used as a single group(exclude the _d version of each sample) and also Broion_BA.SG?
I suggest also to remove from Italian_North.HO, before doing the calcs, ALP277.HO(Domodossola, highly germanic admixed likely from Walser people) ALP481.HO( Sappada Bavarian), HGDP01161.HO, HGDP01162.HO, HGDP01163.HO, HGDP01164.HO, HGDP01166.HO, HGDP01167.HO, HGDP01168.HO, HGDP01169.HO (Tuscans) and maybe also the Bergamo HGDP samples or most of them(HGDP01147.HO, HGDP01151.HO,HGDP01152.HO, HGDP01153.HO, HGDP01155.HO, HGDP01156.HO, HGDP01157.HO, HGDP01171.HO, HGDP01172.HO, HGDP01173.HO, HGDP01174.HO, HGDP01177.HO)
Otherwise you will not get a good representative of the average, you can see on a PCA that they form separate clusters (surely the Tuscans, and the majority of the Bergamo samples)
The problem is, you cannot not run singletons against FST. They have to be as a robust enough grouping.
 
Yes, the use of academic tools is important. I agree that PCA-based distances and FST allele frequency can be a bit different. The PCA I showed earlier was more to demonstrate the position of Bergamo, rather than to compare with the FST analysis. My main concern with this FST analysis to conclude that Northern Italians are closer to C6 than to Latini is that the confidence interval of the FST between Latini and North Italians is too high, so I think this cannot be said with certainty.

However, an interesting point is that in my PCA North Italians drift towards lower values in the PC2 axis in comparison to IA Central Italians, just like the Italy Imperial samples (which also have lower PC2 values). This could reflect complex patterns of shared ancestry that also lowers the FST between North Italians and Central Italy Imperial samples. I am aware that, as you said, PCA analysis is not directly comparable to FST, but both can capture the same population movements, so sometimes the pattern observed in both analyses can be the same.
I like the way you have aggregates represented by the stars which is a great touch for visualization.

One thing that could improve accuracy if you haven't done it already, did you anchor it to a full set of West Eurasians?

I followed this tutorial to produce my PCAs, it is a great resource:

 
The problem is, you cannot not run singletons against FST. They have to be as a robust enough grouping.
Sorry i didn't explain well: i suggested to use them together as a group, 3 samples like the Latini. Their coverage is also good. The study also indicated a fourth sample(MBG017) but it seems actually closer to the rest of Germany Hallstat by PCA and ANF/WHG/Steppe breakdown.
"We detect for all four of these samples (MBG004, MBG016, MBG017 and HOC004) a putative transalpine origin in northern Italy".
As Elrele pointed out on the other forum, the three samples preferred Austria_IA_laTene to Verona_IA. Anyway they are all on the same area of the PCA, which is the one between Italics/Etruscans/Broion_BA and Gauls.
Please, if you run them, remove at least the Tuscans from Italian_North.HO.
 
Last edited:
Such a close (relatively) genetic distance for Bulgarians is unexpected and could indicate that our assumptions about them having significant Slavic ancestry (I refer to "pure", proto-Slavic and not the later Slavic speakers heavily admixed with Balkan-like & other populations) are incorrect. They appear to be closer to Albanians who have very low Slavic ancestry.

South Italy & Sicily also show strong signals despite their increased Iran/CHG component, possibly showing that by that time these components were already significant (and not just 2-3%) in Roman Italy. Mycenaean proxy perhaps?
 
I like the way you have aggregates represented by the stars which is a great touch for visualization.

One thing that could improve accuracy if you haven't done it already, did you anchor it to a full set of West Eurasians?

I followed this tutorial to produce my PCAs, it is a great resource:

If by the full set of West Eurasians you mean using them as the reference populations to build the PCA, I am showing my reference populations here: https://github.com/elrele/pca_img/blob/main/README.md, which are the same ones used in Lazaridis et al. 2014.

My reference population list is a little bit different from the list in your link, but they are still broadly similar, so I presume the results would not differ by much. When I have time in the future, I might redo the plot with your list, but I do not have a few of those populations in my .geno file.

About the stars, they were done with a very simple Python script (which can be easily built with AI today) to plot the data from the pca.evec file.
 
Back
Top