PropertyValue
is nif:broaderContext of
nif:broaderContext
is schema:hasPart of
schema:isPartOf
nif:isString
  • The study was reviewed and approved by the South Cambridgeshire Research Ethics Committee (12/EE/0172). All participants provided written informed consent prior to inclusion. SCOOP, STILTS and UKHLS cohorts were used for the heritability, genetic correlation, genetic risk score and association analyses with established BMI loci, as well as, used as a discovery cohort in the genome-wide association study (GWAS) and gene-based tests. UK Biobank samples were used for genetic correlation analysis and in the replication stages of the GWAS and gene-based tests. ALSPAC was used as an additional control dataset to UKHLS for comparison against SCOOP in the established BMI loci analysis. The aim was to recruit a new cohort of UK European people who are thin (defined as a body mass index < 18kg/m2) and well. After ethical committee approval (12/EE/0172), we worked with the NIHR Primary Care Research Network (PCRN) to collaborate with 601 GP practices in England. Each practice searched their electronic health records using our inclusion criteria (age 18–65 years, BMI<18 kg/m2) and exclusion criteria (medical conditions that could potentially affect weight (chronic renal, liver, gastrointestinal problems, metabolic and psychiatric disease, known eating disorders). A small number of individuals (n = 43) with a BMI of 19.0 kg/m2 were included as they had a strong family history of thinness. The case notes of each potential participant were reviewed by the GP or a senior nurse with clinical knowledge of the participant to exclude other potential causes of low body weight in discussion with the study team. Through this approach we identified 25,000 individuals who fitted our criteria for inclusion in the study. These individuals were invited to participate in the study; approximately 12% (2,900) replied consenting to take part. We obtained a detailed medical and medication history, screened for eating disorders using a questionnaire (SCOFF) that has been validated against more formal clinical assessment [50]. We excluded all participants who stated that they exercised every day/more than 3 times a week/whose reported activity exceeded 6 metabolic equivalents (METs) for any duration or frequency (http://www.who.int/dietphysicalactivity/physical_activity_intensity/en/). With these rather strict criteria for exercise, we sought to limit the contribution of exercise as a contributor to the thinness of participants in the STILTS cohort. We excluded people who were thin only at a certain point in their lives (often as young adults) to focus on those who were persistently thin/always thin throughout life as we hypothesised that this group would be enriched for genetic factors contributing to their thinness. We asked a specific question to identify these individuals: “have you always been thin?” Only those who answered positively were included. Questionnaires were manually checked by senior clinical staff for these parameters and for reported ethnicity (non-European ancestry excluded). DNA was extracted from salivary samples obtained from these individuals using the Oragene 500 kit according to manufacturer’s instructions (S1 Table). With ethical committee approval (MREC 97/5/21), we have recruited 7,000 individuals with severe early-onset obesity (BMI standard deviation score (SDS) > 3; onset of obesity before the age of 10 years) to the Genetics of Obesity Study (GOOS) [51]. The Severe Childhood Onset Obesity Project (SCOOP) cohort [31] is a sub-cohort of GOOS comprised of ~4,800 British individuals of European ancestry; S1 Table). SCOOP individuals likely to have congenital leptin deficiency, a treatable cause of severe obesity, were excluded by measurement of serum leptin, and individuals with mutations in the melanocortin 4 receptor gene (MC4R) (the most common genetic form of penetrant obesity) were excluded by prior Sanger sequencing. Understanding Society (UKHLS) is a longitudinal household study designed to capture economic, social and health information from UK individuals [52]. A subset of 10,484 individuals was selected for genome-wide array genotyping. This cohort was used as a control dataset with SCOOP and STILTS cases (S1 Table). This study includes approximately 487,411 participants with genetic data released (including ~50,000 from the UKBiLEVE cohort [53]) of the total 502,648 individuals from UK BioBank (UKBB). UKBB samples were genotyped on the UK Biobank Axiom array at the Affymetrix Research Services Laboratory in Santa Clara, California, USA and imputed to the Haplotype Reference Consortium (HRC) panel [54]. UKBiLEVE samples were genotyped on the UK BiLEVE array which is a previous version of the UK Biobank Axiom array sharing over 95% of the markers. To date, 487,411 samples with directly genotyped and imputed data are available and data was downloaded using tools provided by UK Biobank. Extensive data from health and lifestyle questionnaires is currently available as well as linked clinical records. BMI, as well as other physical measurements were taken on attendance of recruitment centre. Severely obese participants in the available data were defined as those with BMI ≥ 40 kg/m2 (N = 9,706) and thin individuals were defined as those with BMI ≤ 19 kg/m2 (N = 4,538). Given that it has been previously shown that type I error rate for variants with a low minor allele count (MAC) is inadequately controlled for in very unbalanced case-control scenarios [55], we randomly subsampled 35,000 individuals from the original 487,411 genotyped individuals and removed those with BMI≤19 or BMI ≥30, to generate an independent control set. The 25,856 participants remaining after BMI exclusions from the tails, generated a non-extreme set of individuals kept as putative controls (S2 Fig). The other 452,411 genotyped samples were kept as the BMI dataset for downstream analyses (S11 Table, S2 Fig). An interim release consisting of a subset 152,249 individuals from UKBB was released in May 2015. This interim release was imputed to a combined UK10K and 1000G Phase 3 reference panel and contains several variants which are not currently present in the HRC panel, as such it was used in some of the analyses described. The Avon Longitudinal Study of Parents and Children (ALSPAC) [27,56], also known as Children of the 90s, is a prospective population-based British birth cohort study. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). Further information about this cohort, including details of the genotyping and imputation procedures, can be found in S2 Appendix. This analysis was restricted to a subset of unrelated (identity-by-state < 0.05 [57]) children with genetic data and BMI measured between the age of 12 and 17 years (n = 4,964, 48.5% male). The mean age of the children was 14 years and the mean BMI 20.5. For the SCOOP cohort, DNA was extracted from whole blood as previously described [31]. For the STILTS cohort, DNA was extracted from saliva using the Oragene saliva DNA kits (online protocol) and quantified using Qubit. All samples from SCOOP, STILTS and UKHLS were typed across 30 SNPs on the Sequenom platform (Sequenom Inc. California, USA) for sample quality control. Of the 3,607 SCOOP and STILTS samples submitted for Sequenom genotyping, 3,280 passed quality controls filters (90.9% pass rate). Of the 10,433 UKHLS samples, 9,965 passed Sequenom sample quality control (95.5% pass rate). Subsequently, UKHLS controls were genotyped on the Illumina HumanCoreExome-12v1-0 Beadchip. The 3,280 SCOOP and STILTS samples, and 48 overlapping UKHLS samples (to test for possible array version effects) were genotyped on the Illumina HumanCoreExome-12v1-1 Beadchip by the Genotyping Facility at the Wellcome Sanger Institute (WSI). Genotype calling was performed centrally for all batches at the WSI using GenCall. Criteria for excluding samples were as follows: i) concordance against Sequenom genotypes <90%; ii) for each pair of sample duplicates, exclude one with highest missingness; iii) sex inferred from genetic data different from stated sex; iv) sample call rate <95%; v) sample autosome heterozygosity rate >3 SDS from mean done separately for low (<1%) and high MAF(>1%) bins; vi) magnitude of intensity signal in both channels <90%; and vii) for each pair of related individuals (proportion of IBD (PI_HAT) >0.05), the individual with the lowest call rate was excluded. We performed SNP QC using PLINK v1.07 [58]. Criteria for excluding SNPs was: i) Hardy-Weinberg equilibrium (HWE) p<1x10-6; ii) Call rate <95% for MAF≥5%, call rate <97% for 1% ≤MAF<5%, and call rate <99% for MAF <1%. SMARTPCA v10210 [59] was used for principal component analysis (PCA). To verify the absence of array version effects we used PCA on the subset of shared controls genotyped on both versions of the array. Cut-offs for samples that diverged from the European cluster were chosen manually after inspecting the PCA plot. SNPs with discordant MAFs in the different versions of the array were excluded. After removal of non-European samples and 13 samples due to cryptic relatedness, 1,456 SCOOP and 1,471 STILTS samples remained for analysis. For UKHLS, 82 samples were removed after applying a strict European filter and 680 related samples were removed after applying a “3rd degree” kinship filter in KING [60]. A total of 9,203 samples remained, of which 6,460 had a BMI >19 and <30 (“controls”). Sample QC was performed using all 487,411 samples. Criteria for excluding samples were as follows: i) supplied and genetically inferred sex mismatches; ii) heterozygosity and missingness outliers according to centrally provided sample QC files; iii) samples not used in kinship estimation by UKBB; iv) individuals that did not identify as “white british” or did not cluster with other “white british” in PCA analysis; v) samples that withdrew consent and vi) for each pair of related individuals (KING kinship estimate>0.0442), we randomly selected an individual preferentially keeping cases if one related individual is a control. After sample QC, thirteen individuals with underlying health conditions that could influence their BMI were also removed, twelve had BMI<14, and one had BMI>74. In the end, 7,526 obese, 3,532 thin and 20,720 non-extreme controls remained for case-control analyses. In addition, 387,164 samples remained for analysis of BMI as a continuous trait. There is an overlap of 10, 282 samples (~2.6% of the BMI dataset) with obese and thin cases (S2 Fig). The same procedure was performed on the interim release of 152,249 UKBB samples to produce a set of 2,799 obese, 1,212 thin, 8,193 controls and 127,672 individuals for the independent BMI dataset. All subsequent analyses on UKBB were also performed on this subset to query variants that are not currently available in the full UKBB release. Imputation and genome wide association analyses: SCOOP, STILTS and UKHLS single-variant association analysis: Genotypes from SCOOP, STILTS and UKHLS controls were phased together with SHAPEITv2 [61], and subsequently imputed with IMPUTE2 [62,63] to the merged UK10K and 1000G Phase 3 reference panel [64], containing ~91.3 million autosomal and chromosome X sites, from 6,285 samples. More than 98% of variants with MAF ≥0.5% had an imputation quality score of r2≥0.4, however variants with MAF <0.1% had a poor imputation quality with only 27% variants with r2≥0.4 (S5 Fig). First-pass single-variant association tests were done for all variants irrespective of MAF, or imputation quality score (see below). Analyses of 1,456 SCOOP, 1,471 STILTS and 6,460 controls (BMI range 19–30) of European ancestry were based on the frequentist association test, using the EM algorithm, as implemented in SNPTEST v2.5 [65], under an additive model and adjusting for six PCs and sex as covariates. UKBB BMI dataset single-variant association analysis: For the BMI dataset, we used BOLT-LMM [66] to perform an association analysis with BMI using sex, age, 10 PCs and UKBB genotyping array as covariates. Heritability estimates and genetic correlation: Summary statistics from the SCOOP vs. UKHLS, STILTS vs. UKHLS, UKBB obese vs controls, UKBB thin vs controls and UKBB BMI analyses were filtered and a subset of 1,197,969 HapMap3 SNPs was kept in each dataset. Using LD score regression [67] we first calculated the heritability of severe childhood obesity (SCOOP vs UKHLS) and persistent thinness (STILTS vs UKHLS). For severe childhood obesity, we estimated a prevalence of 0.15% using the BMI centile equivalent to 3SDS in children [68]. In the case of persistent thinness (BMI< = 19), we used a GP based cohort for our prevalence estimates: CALIBER [69]. The CALIBER database consists of 1,173,863 records derived from GP practices. For the heritability analysis, we used a prevalence estimate of 2.8% for BMI< = 19 (Claudia Langenberg and Harry Hemingway, personal communication). We also used LD score regression to calculate the genetic correlation of SCOOP with STILTS, SCOOP with UKBB obese, SCOOP with BMI, STILTS with UKBB thin and STILTS with BMI. The genetic correlation between obesity and persistent thinness with anorexia was estimated using the summary statistics from SCOOP vs UKHLS and STILTS vs. UKHLS, and summary statistics available from the Genetic Consortium for Anorexia Nervosa (GCAN) in LD Hub [70]. The same analysis was repeated for UKBB obese vs controls and UKBB thin vs controls. Genetic correlation estimates for BMI vs Overweight, Obesity Class 1, Obesity Class 2 and Obesity Class 3 were also extracted from LD Hub (S4 Fig). Comparison with established GIANT BMI associated loci: We obtained the list of 97 established BMI associated loci from the publicly available data from the GIANT consortium [24]. We used this list as we wanted to focus on established common variation in Europeans with accurate effect sizes for simulations. In order to test whether there is evidence of enrichment of nominally significant signals with consistent direction of effect, we performed a binomial test using the subset of signals with nominal significance in the SCOOP vs UKHLS, and STILTS vs UKHLS analyses. Variance explained was calculated using the rms package [71] v4.5.0 in R [72] and Nagelkerke’s R2 is reported. Power calculations were performed using Quanto [73]. To calculate ORs and SE from the ALSPAC BMI summary statistics we used genotype counts from SNPTEST output. We then used a z-test to test for significant differences between the OR calculated using genotype counts of SCOOP and ALSPAC against the SCOOP vs. UKHLS OR. Simulations under an additive model: We created 10,000 simulations of 1 million individuals for the 97 GIANT BMI loci randomly sampling alleles based on the allele frequency from the sex-combined European dataset reported in Locke et al. [24] using an R script. For each simulated genotype, we simulated phenotypes with DISSECT [74] using the effect size in GIANT and then removed all samples from the lower tail where the phenotype was <3SDs to better reproduce the actual BMI distribution. Afterwards we randomly sampled 1,471 individuals from the bottom 2.8% and 1,456 from top 0.15% and compared against a random set of 6,460 controls from the equivalent percentiles to BMI 19–30. Finally, for each of these loci, we calculated the absolute difference between our observed OR and the mean OR from the simulations and counted how many times we saw an equal or larger absolute difference in the simulated data and assigned a p-value. This was done separately for SCOOP vs UKHLS and STILTS vs UKHLS. The R package GTX (https://cran.r-project.org/web/packages/gtx/index.html) was used to transpose genotype probabilities into dosages, and a combined dosage score, weighted by the effect size from GIANT, for 97 BMI SNPs [24] was calculated and standardised. We checked whether there was an ordinal relationship between the genetic risk score and BMI category (i.e. thin, normal, or obese) using ordinal logistic regression with the clm function in the ordinal R package. While the assumption of equal variance appears to hold (S6 Fig), the proportional odds assumption indicating equal odds between thin, normal, and obese groups is violated for the BMI genetic risk score and some of the principal component covariates (i.e., PC2, PC3, and PC6). As our primary model, we ran a partial proportional odds model adjusting for PC1, PC4, and PC5 and allowing the BMI genetic score, PC2, PC3, and PC6 to vary between BMI category. To check for consistency, we ran a partial proportional odds model adjusting for the first six PCs and allowing only the BMI genetic score to vary between BMI group and a full proportional odds model allowing all six PCs and the BMI genetic score to vary between BMI group (S1 Appendix). Using ANOVA, we formally tested the proportional odds assumption for the BMI genetic risk score. A genetic risk score was created and an ordinal logistic regression was run for each of the 10,000 simulations. We compared the observed test statistic testing whether the odds were the same by BMI category to the 10,000 simulation test statistics. We calculated the p-value as the number of simulations with a test statistic larger than that observed in the real data. A mean genetic risk score was also calculated for each BMI category (obese, thin and controls) across the 10,000 simulations. A t-test was used to test whether the mean observed GRS score in each category was significantly different from the one estimated using the simulations. First pass single-variant association analyses results were used as discovery datasets for the GWAS. After association analysis, we removed variants with MAF<0.5%, an INFO score <0.4, and HWE p<1x10-6, as these highlighted regions of the genome that were problematic, including CNV regions with poor imputation quality. Quantile-quantile plots indicated that the genomic inflation was well controlled for in SCOOP-UKHLS (λ = 1.06) and STILTS-UKHLS (λ = 1.04), and slightly higher for SCOOP-STILTS (λ = 1.08, S7 Fig). We used LD score regression [67] to correct for inflation not due to polygenicity. To identify distinct loci, we performed clumping as implemented in PLINK [58] using summary statistics from the association tests and LD information from the imputed data, clumping variants 250kb away from an index variant and with an r2>0.1. In order to further identify a set of likely independent signals we performed conditional analysis of the lead SNPs in SNPTEST to take into account long-range LD. A total of 135 autosomal variants with p<1x10-5 in any of the three case-control analyses were taken forward for replication in UKBB. All case-control results are reported with the lower BMI group as reference. We tested 1,208,692 SNPs for association under an additive model in SNPTEST using sex, age, 10 PCs and UKBB genotyping array as covariates. Three comparisons were done: obese vs thin, obese vs controls and controls vs thin. Variants with an INFO score <0.4, HWE p<1x10-6 were filtered out from the results. Inflation factors were calculated using HapMap markers. The LD score regression intercepts were 1.0074 in obese vs thin, 1.0057 in obese vs controls and 1.009 in thin vs controls. We used all thin individuals, regardless of health status, as our replication cohort to maximize power. However, using ICD10 codes and self-reported illness data (S12 and S13 Tables) to remove individuals who had a relevant medical diagnosis before date of attendance at UKBB recruitment centre, yielded 2,518 thin individuals and materially equivalent results (S8 Fig). GIANT, EGG and SCOOP 2013 summary statistics: We obtained summary statistics for the GIANT Extremes obesity meta-analysis [20] from http://portals.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium_data_files. Summary statistics for EGG [30] were obtained from http://egg-consortium.org/childhood-obesity.html. We used summary statistics from our previous study of 1,509 early-onset obesity SCOOP cases compared to 5,380 publicly available WTCCC2 controls (SCOOP 2013) [31]. Data for the SCOOP cases is available to download from the European Genome-Phenome Archive (EGA) using accession number EGAD00010000594. The control samples are available to download using accession numbers EGAD00000000021 and EGAD00000000023. These replication studies are largely non-overlapping with our discovery datasets and each-other. When a lead variant was not available in a replication cohort, a proxy (r2≥ 0.8) was used in the meta-analysis. We meta-analysed summary statistics for the 135 variants reaching p<1x10-5 in SCOOP/STILTS/UKHLS with the corresponding results from UKBB and study specific replication cohorts (S5–S7 Tables). For obese vs. thin and obese vs. controls comparisons we used fixed-effects meta-analysis correcting for unknown sample overlap in replication cohorts using METACARPA [75]. For thin vs. controls we used a fixed-effects meta-analysis in METAL [76]. Heterogeneity was assessed using Cochran’s Q-test heterogeneity p-value in METAL. A signal was considered to replicate if it met all the following criteria: i) consistent direction of effect; ii) p<0.05 in at least one replication cohort; and iii) the meta-analysis p-value reached standard genome-wide significance (p<5x10-8). Given that we are querying additional variants on the lower allele frequency spectrum, one could also use a more strict genome-wide significance threshold taking into account the increased number of tests (p≤1.17x10-8) [77]. In practice, this only affected one previously established signal (SULT1A1, rs3760091) in our obese vs. controls analysis that fell just below this threshold (S6 Table). rs4440960 was later removed from final results (SCOOP vs UKHLS and STILTS vs UKHLS) after close examination revealed it was present in a CNV region with poor imputation quality. Comparison of newly established candidate loci and UKBB independent BMI dataset: We identified eleven signals in SCOOP vs STILTS, nine in SCOOP vs UKHLS and two in UKHLS vs STILTS that were nominally significant in the UKBB BMI dataset GWAS, and directionally consistent. A binomial test was used to check for enrichment of signals with consistent direction of effect (S9 Table). Lookup of previously identified obesity-related signals in our discovery datasets: We took all signals reaching genome-wide significance, or identified for the first time in the GIANT Extremes obesity meta-analysis [20], with either the tails of BMI or obesity classes, and in childhood obesity studies [30,31] and performed look-up of those signals in all three of our discovery analyses (SCOOP vs STILTS, SCOOP vs UKHLS and UKHLS vs STILTS). ORs and p-values from the previous studies and look-up results from our discovery datasets are reported in S10 Table.
rdf:type