North Italians & Iberians closer to C6 than Latini

Jovialis · Jan 5, 2026

Comparison of Genetic Distances of C6 Central Mediterranean and Latini vs Moderns. By FST point estimates, nearly all tested modern populations are closer to C6 than to Latini, including Northern Italians. The two runs retained slightly different SNP sets (~460k vs ~534k after filtering), but this difference should have minimal practical impact on the overall pattern; it mainly affects only borderline cases, where the distances are extremely close.

FST runs with Admixtools2 the graphics were made in Rstudio

Latini in this run are R851, R1016, and R1021 which are the actual specific Latini listed in Antonio et al. 2019.

Stefano · Jan 6, 2026

Can you please run FST distances to MBG004, MBG016, HOC004 used as a single group(exclude the _d version of each sample) and also Broion_BA.SG?
I suggest also to remove from Italian_North.HO, before doing the calcs, ALP277.HO(Domodossola, highly germanic admixed likely from Walser people) ALP481.HO( Sappada Bavarian), HGDP01161.HO, HGDP01162.HO, HGDP01163.HO, HGDP01164.HO, HGDP01166.HO, HGDP01167.HO, HGDP01168.HO, HGDP01169.HO (Tuscans) and maybe also the Bergamo HGDP samples or most of them(HGDP01147.HO, HGDP01151.HO,HGDP01152.HO, HGDP01153.HO, HGDP01155.HO, HGDP01156.HO, HGDP01157.HO, HGDP01171.HO, HGDP01172.HO, HGDP01173.HO, HGDP01174.HO, HGDP01177.HO)
Otherwise you will not get a good representative of the average, you can see on a PCA that they form separate clusters (surely the Tuscans, and the majority of the Bergamo samples)

thejoker · Jan 6, 2026

TuscanHgdp are from the province of Grosseto if i'm not wrong, database lacks a proper North Western Tuscan sample to have a clear idea, let alone a Tuscan\Emilian Appenines sample wich would be even closer to North Italy average if we exclude Alps and Pre-Alps.
It is like considering Romagnoli as a proxy for whole Emilia i guess, with the difference that Romagnoli plot a bit more Eastern and southward compared to Tuscans .

HereToLearn · Jan 7, 2026

So, are Northern Italians the closest modern population to IA Latins? Have you considered other modern populations further to those in your graph?

HereToLearn · Jan 7, 2026

Stefano said:
Otherwise you will not get a good representative of the average, you can see on a PCA that they form separate clusters (surely the Tuscans, and the majority of the Bergamo samples)

The Bergamese are prototypical Northern Italians, it would be a bit weird in my opinion to exclude them from a Northern Italian average.

Stefano · Jan 7, 2026

HereToLearn said:
The Bergamese are prototypical Northern Italians, it would be a bit weird in my opinion to exclude them from a Northern Italian average.

No, you can clearly see on a PCA that they form a separate cluster, west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman. If the goal is to evaluate FST distances to the average of present day populations it does not make sense to use more than half samples outside of the main cluster. Also Greek.HO is really messy and it would be more meningful to use the Athens or Thessaloniki subset. English.HO is a mix of Cornish and English from Kent.

HereToLearn · Jan 7, 2026

Stefano said:
No, you can clearly see on a PCA that they form a separate cluster, west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman.

We agree. Accordingly, they are as “purely” Northern Italian as one can be. The fact that being quintessentially Northern Italian would nonetheless pull them away from a supposed “Northern Italian average” is, indeed, a paradox. If resolving that paradox were to require the exclusion of the Bergamaschi, the resulting outcome would in any event have to be assessed with a very significant caveat (at least in my view): namely, that the “Northern Italian average” would have been established without taking into account one of the modern populations that is most authentically Northern Italian.

Vallicanus · Jan 7, 2026

Stefano said:
west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman.

This explains the Bergamo cluster well.

Vitruvius · Jan 9, 2026

HereToLearn said:
We agree. Accordingly, they are as “purely” Northern Italian as one can be. The fact that being quintessentially Northern Italian would nonetheless pull them away from a supposed “Northern Italian average” is, indeed, a paradox. If resolving that paradox were to require the exclusion of the Bergamaschi, the resulting outcome would in any event have to be assessed with a very significant caveat (at least in my view): namely, that the “Northern Italian average” would have been established without taking into account one of the modern populations that is most authentically Northern Italian.

Stefano said:
No, you can clearly see on a PCA that they form a separate cluster, west shifted, they have more EEF derived from their local IA source. They for sure are among the one closer to IA cisalpine and retaining the higher amount, but this pulls them away from the average, which has both more germanic and imperial roman. If the goal is to evaluate FST distances to the average of present day populations it does not make sense to use more than half samples outside of the main cluster. Also Greek.HO is really messy and it would be more meningful to use the Athens or Thessaloniki subset. English.HO is a mix of Cornish and English from Kent.

I think it's incorrect to exclude them but more correct to use weighted averages relative to Bergamo's numeric representation in northern Italy. I would also advise that it's very possible you may get results which are far less EEF heavy depending on what village or town the samples come from. It's also rather presumptious to assume the average is further east due to Germanic admixture rather than just a normal distribution of steppe ancestry that was already prior local to Italy.

Elrele · Jan 9, 2026

In this PCA (built with smartpca), Bergamo does indeed appear slightly more shifted toward Etruscan-related populations. The reference population panel follows the standard West Eurasian set used in Lazaridis et al. 2014.

Obs: previous image link expired.

Jovialis · Jan 12, 2026

Always good to see people using the academic tools.

PCA-based distances and FST Allele frequency can be a bit different.

At any rate, here's another population that demonstrates more of the same pattern:

Roman Bivio (Umbria) is indistinguishable from C6, which essentially means it is a C6 population.

Elrele · Jan 24, 2026

Yes, the use of academic tools is important. I agree that PCA-based distances and FST allele frequency can be a bit different. The PCA I showed earlier was more to demonstrate the position of Bergamo, rather than to compare with the FST analysis. My main concern with this FST analysis to conclude that Northern Italians are closer to C6 than to Latini is that the confidence interval of the FST between Latini and North Italians is too high, so I think this cannot be said with certainty.

However, an interesting point is that in my PCA North Italians drift towards lower values in the PC2 axis in comparison to IA Central Italians, just like the Italy Imperial samples (which also have lower PC2 values). This could reflect complex patterns of shared ancestry that also lowers the FST between North Italians and Central Italy Imperial samples. I am aware that, as you said, PCA analysis is not directly comparable to FST, but both can capture the same population movements, so sometimes the pattern observed in both analyses can be the same.

Jovialis · Jan 26, 2026

Stefano said:
Can you please run FST distances to MBG004, MBG016, HOC004 used as a single group(exclude the _d version of each sample) and also Broion_BA.SG?
I suggest also to remove from Italian_North.HO, before doing the calcs, ALP277.HO(Domodossola, highly germanic admixed likely from Walser people) ALP481.HO( Sappada Bavarian), HGDP01161.HO, HGDP01162.HO, HGDP01163.HO, HGDP01164.HO, HGDP01166.HO, HGDP01167.HO, HGDP01168.HO, HGDP01169.HO (Tuscans) and maybe also the Bergamo HGDP samples or most of them(HGDP01147.HO, HGDP01151.HO,HGDP01152.HO, HGDP01153.HO, HGDP01155.HO, HGDP01156.HO, HGDP01157.HO, HGDP01171.HO, HGDP01172.HO, HGDP01173.HO, HGDP01174.HO, HGDP01177.HO)
Otherwise you will not get a good representative of the average, you can see on a PCA that they form separate clusters (surely the Tuscans, and the majority of the Bergamo samples)

The problem is, you cannot not run singletons against FST. They have to be as a robust enough grouping.

Jovialis · Jan 26, 2026

Elrele said:
Yes, the use of academic tools is important. I agree that PCA-based distances and FST allele frequency can be a bit different. The PCA I showed earlier was more to demonstrate the position of Bergamo, rather than to compare with the FST analysis. My main concern with this FST analysis to conclude that Northern Italians are closer to C6 than to Latini is that the confidence interval of the FST between Latini and North Italians is too high, so I think this cannot be said with certainty.

However, an interesting point is that in my PCA North Italians drift towards lower values in the PC2 axis in comparison to IA Central Italians, just like the Italy Imperial samples (which also have lower PC2 values). This could reflect complex patterns of shared ancestry that also lowers the FST between North Italians and Central Italy Imperial samples. I am aware that, as you said, PCA analysis is not directly comparable to FST, but both can capture the same population movements, so sometimes the pattern observed in both analyses can be the same.

I like the way you have aggregates represented by the stars which is a great touch for visualization.

One thing that could improve accuracy if you haven't done it already, did you anchor it to a full set of West Eurasians?

I followed this tutorial to produce my PCAs, it is a great resource:

Projecting ancient samples

This Vignette provides an example of how to project ancient DNA onto modern data using the smartsnp package.

christianhuber.github.io

Stefano · Jan 26, 2026

Alessio said:
The problem is, you cannot not run singletons against FST. They have to be as a robust enough grouping.

Sorry i didn't explain well: i suggested to use them together as a group, 3 samples like the Latini. Their coverage is also good. The study also indicated a fourth sample(MBG017) but it seems actually closer to the rest of Germany Hallstat by PCA and ANF/WHG/Steppe breakdown.
"We detect for all four of these samples (MBG004, MBG016, MBG017 and HOC004) a putative transalpine origin in northern Italy".
As Elrele pointed out on the other forum, the three samples preferred Austria_IA_laTene to Verona_IA. Anyway they are all on the same area of the PCA, which is the one between Italics/Etruscans/Broion_BA and Gauls.
Please, if you run them, remove at least the Tuscans from Italian_North.HO.

arxaiogenetiki.blogspot · Jan 26, 2026

Such a close (relatively) genetic distance for Bulgarians is unexpected and could indicate that our assumptions about them having significant Slavic ancestry (I refer to "pure", proto-Slavic and not the later Slavic speakers heavily admixed with Balkan-like & other populations) are incorrect. They appear to be closer to Albanians who have very low Slavic ancestry.

South Italy & Sicily also show strong signals despite their increased Iran/CHG component, possibly showing that by that time these components were already significant (and not just 2-3%) in Roman Italy. Mycenaean proxy perhaps?

Elrele · Jan 27, 2026

Alessio said:
I like the way you have aggregates represented by the stars which is a great touch for visualization.

One thing that could improve accuracy if you haven't done it already, did you anchor it to a full set of West Eurasians?

I followed this tutorial to produce my PCAs, it is a great resource:

Projecting ancient samples

This Vignette provides an example of how to project ancient DNA onto modern data using the smartsnp package.

christianhuber.github.io

If by the full set of West Eurasians you mean using them as the reference populations to build the PCA, I am showing my reference populations here: https://github.com/elrele/pca_img/blob/main/README.md, which are the same ones used in Lazaridis et al. 2014.

My reference population list is a little bit different from the list in your link, but they are still broadly similar, so I presume the results would not differ by much. When I have time in the future, I might redo the plot with your list, but I do not have a few of those populations in my .geno file.

About the stars, they were done with a very simple Python script (which can be easily built with AI today) to plot the data from the pca.evec file.

North Italians & Iberians closer to C6 than Latini

Jovialis

Advisor

Stefano

Regular Member

thejoker

Regular Member

HereToLearn

Regular Member

HereToLearn

Regular Member

Stefano

Regular Member

HereToLearn

Regular Member

Vallicanus

Regular Member

Vitruvius

Well-known member

Elrele

Regular Member

Jovialis

Advisor

Elrele

Regular Member

Jovialis

Advisor

Jovialis

Advisor

Projecting ancient samples

Stefano

Regular Member

arxaiogenetiki.blogspot

Regular Member

Elrele

Regular Member

Projecting ancient samples