Tautalus
Regular Member
- Messages
- 545
- Reaction score
- 1,380
- Points
- 93
- Ethnic group
- Portuguese
- Y-DNA haplogroup
- I2-M223 / I-FTB15368
- mtDNA haplogroup
- H6a1b2y
An interesting paper on how to infer local ancestry through the use of an optimised version of ancestral recombination graphs (ARGs).
This work addresses a central challenge in genetics: determining which parts of modern human DNA come from different ancient populations. As the admixture events become more distant in time, the inherited DNA segments become smaller and more fragmented, making it increasingly difficult for traditional methods to accurately assign their origins. Earlier approaches typically relied on comparing segments of DNA to reference populations based on overall similarity, which works well for recent admixture but becomes unreliable for ancient events and can be biased by differences in total ancestry proportions between populations.
The paper introduces a new method, ARGMix, which reframes the problem by modelling genetic data as a network of relationships rather than a simple sequence. Using a graph transformer, a form of deep learning designed to analyse structured data, it leverages ancestral recombination graphs to capture how DNA segments are related through evolutionary history. By incorporating information about when lineages share common ancestors and using ancient DNA samples as references, the method can more accurately trace the origin of even very small DNA fragments. This represents a shift from surface level pattern matching to reasoning over genealogical structure, leading to substantial improvements in accuracy and robustness compared to previous approaches. A key innovation of the method is its ability to perform ancestry specific analyses by “masking” the genome. In practice, this means that DNA segments not belonging to a chosen ancestry are temporarily hidden, allowing comparisons to be made using only a single ancestral component. This avoids a major limitation of earlier methods, where populations could appear similar simply because they share higher proportions of a given ancestry, rather than because their ancestry is more closely related.
Applying this framework to ancient and modern European genomes reveals new insights into population history. In the case of Ötzi the Iceman, previous studies consistently found that he clustered most closely with Sardinians, an observation driven by their high proportion of early farmer ancestry. However, when the analysis is restricted to only the Anatolian farmer component, the paper shows that Ötzi’s ancestry aligns more closely with present-day populations from northern Italy, particularly around Bergamo. This finding suggests a degree of local genetic continuity in the Alpine region that had been obscured by later admixture events.
AbstractThis work addresses a central challenge in genetics: determining which parts of modern human DNA come from different ancient populations. As the admixture events become more distant in time, the inherited DNA segments become smaller and more fragmented, making it increasingly difficult for traditional methods to accurately assign their origins. Earlier approaches typically relied on comparing segments of DNA to reference populations based on overall similarity, which works well for recent admixture but becomes unreliable for ancient events and can be biased by differences in total ancestry proportions between populations.
The paper introduces a new method, ARGMix, which reframes the problem by modelling genetic data as a network of relationships rather than a simple sequence. Using a graph transformer, a form of deep learning designed to analyse structured data, it leverages ancestral recombination graphs to capture how DNA segments are related through evolutionary history. By incorporating information about when lineages share common ancestors and using ancient DNA samples as references, the method can more accurately trace the origin of even very small DNA fragments. This represents a shift from surface level pattern matching to reasoning over genealogical structure, leading to substantial improvements in accuracy and robustness compared to previous approaches. A key innovation of the method is its ability to perform ancestry specific analyses by “masking” the genome. In practice, this means that DNA segments not belonging to a chosen ancestry are temporarily hidden, allowing comparisons to be made using only a single ancestral component. This avoids a major limitation of earlier methods, where populations could appear similar simply because they share higher proportions of a given ancestry, rather than because their ancestry is more closely related.
Applying this framework to ancient and modern European genomes reveals new insights into population history. In the case of Ötzi the Iceman, previous studies consistently found that he clustered most closely with Sardinians, an observation driven by their high proportion of early farmer ancestry. However, when the analysis is restricted to only the Anatolian farmer component, the paper shows that Ötzi’s ancestry aligns more closely with present-day populations from northern Italy, particularly around Bergamo. This finding suggests a degree of local genetic continuity in the Alpine region that had been obscured by later admixture events.
Local ancestry inference classifies segments of DNA in admixed individuals by their originating population. However, as the date of admixture becomes older, these segments become shorter and determining their ancestry becomes increasingly difficult. This limits many existing segment-based methods to relatively recent historical admixture events and more highly diverged populations. The rapidly expanding availability of ancient DNA offers a promising opportunity to use these ancient samples as references for local ancestry inference. A recent approach integrates ancient samples into the ancestral recombination graph (ARG) for local ancestry inference. Here, we introduce recent advances in deep learning for graphs into this ARG framework to create ARGMix, a graph transformer that infers local ancestry using the coalescent trees of the inferred ARG. Our approach employs ancient samples as references in the marginal trees to predict local ancestry. We train ARGMix on data reflecting the well-understood ancient European demography and demonstrate improved accuracy and robustness even under demographic misspecification. We then apply ARGMix to an ARG of ancient and present-day European samples for ancestry-specific analyses, finding evidence of continuity between Ötzi the Iceman and present-day individuals from nearby regions.
Population structure of present-day Europeans and the Iceman. (A) Principal component analysis (PCA) of the Iceman and European populations from the Human Genome Diversity Project and the 1000 Genomes Project. (B) Anatolian-specific PCA of the same populations generated by masking non-Anatolian ancestry.