Genetics

Work Package 1

CoSTREAM's first Work Package aimed to identify shared genetic factors common to stroke and Alzheimer's disease to identify mechanisms and pathways common to both diseases and to link these with metabolic, imaging, and clinical findings. It will determine the genetic overlap between stroke and Alzehimer's disease as well as their subtypes and provide an estimate of the genetic correlation between the two. Furthermore, this Work Package will pinpoint specific genes or genomic regions that mediate risk to stroke or stroke subtypes, relevant MRI markers and Alzheimer's disease.

Both stroke and Alzheimer’s disease (AD) have been identified as diseases with a substantial heritable component. From epidemiological studies, it is known that stroke often precedes dementia and it is thus important to identify shared pathways between the diseases, specifically ones that are modulated by genetic risk factors. In recent years, computational methods have been developed to collectively study these shared influences, with increased power using readily available summary statistics in contrast to former methods where individual-level data were required. An important first step to identify these shared mechanisms is to correctly quantify the univariate heritability of the respective disease. Further, the bivariate shared heritability between the phenotypes can be estimated.

Therefore, we split this into two tasks:

  • Investigating univariate genetic heritability of stroke, stroke subtypes, intermediate MRI markers and AD.
  • Pinpointing specific genes or genomic regions that mediate risk to stroke or stroke subtypes, relevant MRI markers and AD

We first computed and compared the estimated univariate heritabilities using LD Score regression and previously published results (Traylor et al. 2016). Second, we wanted to quantify the genetic overlap and bivariate heritabilities between stroke and AD and subsequently compare to previously published data. We used summary statistics of the largest available datasets for stroke (METASTROKE) and AD (IGAP).

Genetic correlation analysis between ischemic stroke and subtypes and AD showed mostly non-significant genetic overlap between the two diseases. The only significant genetic overlap was found between AD and cardioembolic stroke, where there was a strong negative genetic correlation. This is in contrast with previously published results, where with a smaller dataset, a significant positive correlation between AD and small vessel stroke was reported while all other etiological subtypes and ischemic stroke were non-significant.

As the results from this bivariate analysis and published results differ from each other, it is paramount to confirm that indeed a bigger dataset gives a more realistic measure of the genetic overlap between stroke and AD regardless of methodology.

We built on this analysis to pinpoint specific genes or genomic regions that mediate risk to stroke or stroke subtypes, relevant MRI markers and AD and create polygenic risk score from clinical case/control data with validation in prospective population-based cohorts and vice-versa using the largest available datasets.

To fulfil these objectives, we used the newest available genetic dataset for any stroke, any ischemic stroke and ischemic stroke subtypes. MEGASTROKE had since replaced METASTROKE, NINDS-SiGN and CHARGE as the largest and most comprehensive dataset for stroke and stroke subtypes. As primary investigators of this dataset, LMU were granted early access to the summary statistics comprising 67,162 cases and 454,450 controls for any stroke (ischemic + haemorrhagic), any ischemic stroke and ischemic stroke subtypes from European and non-European origin. For AD, the largest published available dataset remained IGAP.

Using these data, we re-visited and confirmed the univariate heritability of stroke and stroke subtypes as being in the same estimated range compared to the METASTROKE data.

Building on our results from the univariate and bivariate analyses of heritability of stroke and AD, we set out to identify specific genomic regions and variants (SNPs) that harbour a signal influencing risk for both stroke and AD. These specific regions, genes and variants should be carried forward to other work packages to provide testable hypotheses (e.g. risk prediction). Although we identified a non-significant overlap between the two diseases (ischemic stroke and its subtypes and AD) on a genome-wide level using LD score regression, there is potential that specific regions will display a shared genetic susceptibility, independently of the genome-wide level.

GWAS-PW had recently been proposed as a method to determine overlap of specific genomic regions between two phenotypes using Bayesian statistics. GWAS-PW estimates the probability that a given genomic region either (model 1) contains a genetic variant that influences the first trait, (model 2) contains a genetic variant that influences the second trait, (model 3) contains a genetic variant that influences both traits, or (model 4) contains both a genetic variant that influences the first trait and a separate genetic variant that influences the second trait.

We used summary statistics from the European part of MEGASTROKE to be able to use GWAS-PW with the pre-defined LD block information for European populations. Additionally, the IGAP data used for AD is a European-only sample, making the analysis consistent.

A posterior probability of model 3 > 0.9 was deemed to be significant for a shared genomic region. For each region of interest, SNPs were plotted in a locuszoom plot with the posterior probability for each variant on the y-axis and the genomic position on the x-axis. The SNP with the highest posterior probability is the most likely causal SNP shared between both phenotypes.

To benchmark the method and parameters used, we first conducted a GWAS-PW analysis of closely related phenotypes (coronary artery disease and low density lipoprotein levels), which have shown to display a large amount of shared genetic risk regions. Here, we found seven independent regions displaying a posterior probability > 0.9, thereby confirming our approach and settings.

We conducted GWAS-PW analyses for any stroke, any ischemic stroke and ischemic stroke subtypes with AD, respectively. We could not identify a specific genomic region shared between stroke/AD with a posterior probability of model 3 > 0.9 in all comparison analyses. However, there were some regions of interest.

For small vessel stroke and AD, we identified a region on chromosome 2 (chr2:202,819,643-205,799,152) with a posterior probability of 0.56 for model 3. This region harbours the genes NBEAL1, ICA1L, CARF and WDR12 which have been implicated in influencing the volume of white matter hyperintensities in the general population and in stroke cases. Further we found a region on chromosome 11 (chr11:47,008,125-49,865,178) with a posterior probability of 0.69 for model 3, including no known risk genes for either phenotype.

For large artery stroke (LAS) and AD, we found a region on chromosome 3 (chr3:104,581,842-106,982,535) with a posterior probability of 0.79 for model 3. SNP-wise analysis showed one SNP (rs7647426) to be associated with both diseases with a posterior probability of 0.46. This SNP is located in the promotor region of ALCAM.

Polygenic risk score from clinical case/control data with validation in prospective population-based cohorts and vice-versa

For the polygenic-risk score (PRS) analysis, we again used summary statistics from the European part of MEGASTROKE to be able to construct PRS with the pre-defined LD block information for European populations. Additionally, the IGAP data used for AD is a European-only sample, making the analyses consistent. Modelling a transethnic analysis using the full power of MEGASTROKE would be difficult due to differences in LD structure between populations and transethnic PRS have only been reported using full individual level data.

We constructed PRS for three different p-value cut-offs (p<1E-4, p<0.05 and p<0.5) representing a high, intermediate and low degree of association with the training phenotype, following recent guidelines. This resulted in 15 PRS (3 p-values x 5 phenotypes) being tested in the replication sample of AD.

At the p-value cut-off of p<1E-4, we found no significant results for any combination of stroke phenotypes and AD. When considering the p-value thresholds of p<0.05 and p<0.5, we find significant results for stroke and all stroke subtypes and AD, however with varying degrees of association and variance explained. Both LAS and CES consistently show the highest degree of variance explained (0.16%

Our interpretation of the results is that independent risk SNPs associated with stroke at a level of p<1E-4 are not associated with AD from this combined score. We also could not find convincing evidence that risk regions are shared between stroke and AD. Hence, we have to conclude that at this high level of association, there is no overlap between stroke and AD. The discordant effect directions between AS/SVS and AD are surprising, but should not be over-interpreted due to non-significant findings. The discordant effect directions could simply be explained by chance.

At a lower level of association with stroke (p<0.05 and p<0.5), we find convincing evidence of association between all stroke types and AD. However, the variance explained in AD by the stroke PRS is very small (0.01%-0.32%). In similar studies, the variance explained ranged between 1% and 3%, depending on p-value cut-off.

From this experiment, we can conclude that PRS derived from a lower association level in stroke can to some degree predict AD in the replication sample. However, a large number of independent SNPs (>20,000) is needed to achieve such an association. It seems clear from this analysis and others that stroke and AD do not share variants or genetic regions that are highly relevant for both diseases. Rather, it seems that multiple variants associated with both phenotypes at a lower level of association form the basis of the shared genetic signal.

For such a PRS consisting of a large number of variants to be used in clinical practice, the patient would have to undergo genome-wide genotyping to have all genetic information available to stratify the patients into low- and high-risk individuals. While this was unfeasible a couple of years ago, elements of potential clinical implementation can now be foreseen. For example, genome-wide array genotyping has a 1-time cost (approximately US$50 at current prices) and can be used to calculate updated genomic risk scores for stroke as further, more powerful association data emerge.

We extended on multiple aspects of our past research to strengthen the previous results. Genetic research in stroke and dementia is an ever-moving target, with new datasets and methods becoming available almost monthly. To this end, we repeated the previous analysis and increased the power to detect significant associations with new datasets and algorithms.

First, a new genome-wide association study on clinically defined late-onset AD was released by Kunkle et al., which revealed four novel loci associated with AD, implicating amyloid beta, tau, immunity and lipid processing. This dataset represents a new generation of IGAP results. Using this dataset, we tried to confirm the results obtained before.

The UK Biobank offered two new distinct phenotypes for analysis of AD: Classical ICD9 and ICD10 hospital-based coding to determine algorithmically derived AD outcomes and a novel phenotyping method where parental history of dementia is used in a liability model to assign a continuous phenotype (0-2) to each individual. This strategy showed improved power to detect genetic associations with AD. In contrast to the summary statistics-based method shown above, this allows us to test a derived PRS on individual level data. Testing a PRS on individual level data is preferable because statistics can be computed for each individual, rather than broad association statistics for the whole population.

We derived PRS from several Europeans-only meta-analysis results for stroke, the any stroke (AS) and small vessel stroke (SVS) and from the MEGASTROKE dataset. We tested each of these PRS in the following datasets:

  • Clinically defined late-onset AD
  • UKB algorithmically defined AD outcome
  • UKB proxy phenotype definition
  • For each PRS tested, we provide the following metrics:
  • The optimal p-value threshold describing the most predictive PRS
  • The Rvalue of this optimal model

Decile plots of UKB individuals stratified by their stroke PRS value and in relation to the Odds ratio of the outcome phenotype (AD proxy). This can only be performed on the continuous AD proxy phenotype.

Using the Kunkle et al summary statistics, we confirmed the association of a stroke-derived PRS with AD status at multiple p-value thresholds. We confirmed significance at all p-value thresholds of p<0.05 and above for AS (p=1.4E-13) and SVS (p=0.0090). Using a different method (pseudo-R2) we also confirmed the proportion of variance explained between 9.7E-4 (for AS) and 1.3E-4 (for SVS).

In the UKB, we found only borderline associations of AD and AS. While the association of the AS PRS with AD was significant (p=0.002), the variance explained was still low. The SVS PRS did not show significant results. However, it is too early to exclude that such an association does not exist. It may require a larger AD outcome dataset to show this.

We found two strong associations with a PRS derived from AS and SVS with the proxy phenotype AD status. Our analysis suggests that there are >10,000 SNPs included in the PRS in both cases. While the associations were highly significant, the variance explained is still low in both cases. However, when dividing all individuals in deciles of PRS scores, we clearly see a linear trend for AS, but not for SVS, highlighting a potential predictive ability of a PRS derived from AS in AD.

Taken together, there is very significant evidence of genetic overlap between stroke phenotypes and phenotypes of AD. However, the variance explained and thus the predictive value may be exceedingly small and only be relevant for those in the extremes. At this point and with the data at hand, we cannot find any clinical predictive value in combining the genetics of stroke and AD, while still there is evidence for shared biology between the traits. However, as genetic research will progress and become more precise in determining causal relations, the explained variance may increase (substantially). Our findings predict that vascular interventions targeting common genetic pathway may prevent a yet unknown proportion of AD patients.

Summarizing, CoSTREAM’s Work Package 1

  • Established non-significant bi-variate heritability for stroke, stroke subtypes and AD.
  • Updated the analysis of univariate heritability of stroke and AD in larger datasets.
  • Tried to pinpoint specific genes or genomic regions that mediate risk to stroke or AD. While one region of interest remains (NBEAL-ICA1L-CARF1), the statistical support still remains low.
  • Created and replicated a polygenic risk score based on genes implicated in stroke that predicts AD. While the associations were highly significant, the variance explained is still low in both cases.