nif:isString
|
-
The Anhui experimental rhesus monkey Center, Chengdu pingan animal Breeding and Research Center and Primate experimental animal center of KIZ have provide the provided the blood samples. Blood samples were salvaged from routine health checks for newly captured wild macaques at these breeding centers. We have obtained permission from the centers to analyze the blood samples for this study. These captures are usually as the founder populations of the breeding populations. And These captures of rhesus monkey were authorized by the specific government agency, for example, the Sichuan Department of Forestry, Yunnan Department of Forestry, or An Department of Forestry which is in charge of a certain sampling region. We collected the fecal samples from the Henan Taihang Macaque Nature Reserve, and the Zhangjiajie National Forest Park. We have obtained permission from the above administrative departments to observe the animal behavior of rhesus monkey and collect the stool samples. We also confirm that the behavioral observations did not impact the animals in any way.
We collected 337 wild samples from most of the extant groups of M. mulatta across its entire range in China (Figure 1). Samples used for genetic analysis included feces (n = 78) and blood (n = 259). Blood samples were salvaged from routine health checks for newly captured wild macaques at breeding centers. Fecal samples, collected during direct behavioral observations, were stored in 95% ethanol. In order to avoid resampling of fecal material from the same individuals, we sampled each group once and on a specific day only. Each dropping was identified by size, shape, and color, and only one sample located <1.5 m apart was collected [38]. D-loop sequences from 168 M. mulatta and 56 haplotypes from Sichuan used by Li et al. [29] were downloaded from GenBank. The downloaded data included 76, 21, 13, 109, and 5 sequences originating in China (Guangxi, 20; Sichuan, 56), Nepal, Myanmar, India, and Viet Nam, respectively. The detailed sample data were listed in Table S1.
Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0055315.g001 Map of sampling locations for the rhesus macaque, Macaca mulatta.Dots indicate specific locality data. SC = Sichuan (SCML = Muli; SCDB = Danba), YN = Yunnan (YNJG = Jinggu; YNSM = Simao ( = Pu’er); YNZY = Zhenyuang; YNMJ = Mojiang), HI = Hainan, HA = Henan, HN = Hunan, GZ = Guizhou, GX = Guangxi, FJ = Fujian, and ZJ = Zhejiang.
Blood samples were digested using proteinase K. This was followed by a standard 3-step phenol/chloroform extraction procedure. Fecal samples extracted using the 2CTAB/PCI protocol as modified by Vallet et al. [39]. DNA was then electrophoresed through an agarose gel to assess its quality and quantity. High-quality samples were diluted to ∼20 ng/µL in TE buffer (10 mM Tris-HCl, pH 8.0, 0.1 mM EDTA). To reconstruct the matrilineal genealogy, three fragments from the mitochondrial genome were selected for sequencing. A partial fragment of the D-loop included 510 base pairs (bp) in length. Another two mitochondrial fragments included part of the cytochrome b (Cytb) gene (882 bp) and part of rRNA 16S (861 bp). Partial fragments of the hypervariable D-loop were initially sequenced for all 337 specimens. Subsequently, 16S and Cytb fragments were sequenced for a subset of samples (66 specimens) that represented the major haplogroups identified from the genealogical reconstruction of the D-loop data (Table S1). The data from the 16S and Cytb fragments were used to estimate divergences deeper than those revealed by the D-loop data. PCR cycling conditions for D-loop consisted of an initial denaturation of 5 min at 94°C, followed by 35 cycles (94°C denaturation for 1 min, 50°C annealing for 1 min, and for 1 min extension at 72°C). Final extension at 72°C was conducted for 10 min. We used the same procedure for Cytb and 16S but with annealing at 55°C and 59°C, respectively. Primers sequences were summarized in Table 1.
Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0055315.t001 Primers used in PCR and sequencing for the rhesus macaque, Macaca mulatta. The amplified DNA fragments were purified via spin columns and sequenced using an ABI PRISM 3730 DNA analyzer following the manufacturer’s protocols. Sequences were determined in both directions for each individual. To preclude the inclusion of NUMTs, all fragments were submitted for BLAST searching [40] in GenBank to ensure that the required sequences had been amplified. Further, protein-coding nucleotide sequences were translated to amino acids using MEGA 4.1 [41] to check for premature stop codons. We chose six previously reported tetranucleotide microsatellite loci (Table 2) to assess biparental gene flow. PCR procedures involved an initial denaturation at 94°C for 4 min followed by 30 cycles of 30 sec at 94°C, 40 sec at 52.1–60.0°C for annealing, 30 sec at 72°C, and a final extension at 72°C for 10 min. PCR products were diluted and mixed with LIZ 500 size standards. Microsatellites were obtained using an ABI PRISM 3730 DNA analyzer (ABI, USA). Allele scoring was performed using genemapper [42].
Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0055315.t002 Primer sequences used in the microsatellite study. The fluorescent dyes, original references, and annealing temperatures used for rhesus macaque-derived markers, with the number of alleles (NA) and allele size range found in the 265 samples screened in this study are listed.a,b.aThe sequence of primers used to amplify each locus can be obtained from Research Genetics Inc.bThe chromosome number of the STR markers in Macaca mulatta was determined by e-PCR in the UniSTS database from NCBI.
Alignments were first conducted using Clustal W [43] integrated into MEGA v.4.1 [41] while employing default parameters. When required, the initial alignments were subsequently adjusted by eye using MEGA v.4.1. Protein-coding nucleotide sequences were translated to amino acids using MEGA v.4.1 to confirm alignments. Gaps were considered to be a nucleotide variant (fifth state). Identical haplotypes were collapsed using DnaSP, v.5.0 [44].
For phylogenetic analyses, several species of macaques were selected as outgroup taxa. Two mitochondrial datasets were analyzed separately. First, the D-loop fragments alone, including 214 unique haplotypes plus the four outgroup taxa from M. fascicularis, M. thibetana, M. sylvanus, and M. arctoides, were subjected to phylogenetic analyses. The second dataset included all three mtDNA fragments for 67 haplotypes plus 12 outgroup taxa including M. fascicularis, M. thibetana, M. sylvanus, M. arctoides, M. assamensis, M. maura, M. nigra, M. ochreata, M. pagensis, M. radiata, and M. silenus. A heuristic maximum parsimony approach was conducted for both datasets. All characters were unordered and equally weighted. The search was conducted using PAUP* v.4.0b10 [45] with 100 random stepwise additions and TBR branch swapping. Nonparametric bootstrap proportions (BSP) based on 1000 pseudoreplicates were used to infer nodal support. For the first dataset, associations among the 214 D-loop haplotypes were also visualized by a full median-joining network [46], [47] with MP post-processing [48] as implemented in Network v.4.5 (www.fluxus-engineering.com/sharenet.htm). For the second dataset, phylogenetic relationships were also inferred using Bayesian inference as implemented in MrBayes v.3.1.2 [49]. The best-fitting nucleotide substitution model for each gene (16S:GTR+I+G, Cytb: GTR+G, D-loop: HKY+I+G) was selected based on the Akaike Information Criterion as implemented in Modeltest v.3.7 [50] using default parameters. Four Monte Carlo Markov chains (MCMC) were used and the dataset was run for 10 million generations and trees were sampled every 200 generations. The last 25% of the sampled trees were used to estimate the consensus tree and corresponding Bayesian posterior probabilities.
Isolation-by-distance and historical population analyses: Using the first dataset, we tested the null hypothesis of a correlation between genetic and geographical distances, i.e. isolation-by-distance (IBD), for the matrilines using both all ten sampling localities and the six western sites that had specific locality data (Figure 1). Approximate geographic coordinates were determined using Google Earth. Distances between localitied were estimated using the spherical distance between two points. The pairwise genetic differentiation values were assumed to measure the extent of DNA divergence between populations, and the significance was tested using 1,000 permutations with Arlequin v.3.11 [51]. Correlation of geographic and genetic distances was determined using Mantel’s permutation test with 10,000 permutations executed by IBDWS v.3.15 [52]. The strength of this relationship was determined by regressing all pairwise genetic differentiation values, FST, against the corresponding log10 transformed geographical distance. Historical population dynamics of the major haplogroups assigned in the median-joining network were examined with mismatch distribution analysis [53] as implemented in Arlequin v.3.11. This analysis compared the observed frequencies of pairwise differences of haplotypes with those expected under a single sudden expansion model [53]. An expected distribution under a model of sudden demographic expansion was generated with a total of 1000 permutations. Demographic stability has produced multimodal distributions, and unimodal patterns have occurred during sudden population expansion [54]. The raggedness index was expected to have a higher value in relatively stable populations. Tajima’s D [55] and Fu’s Fs statistics [56], calculated using Arlequin v.3.11, were also used to seek evidence of demographic expansions within individual haplogroups and subhaplogroups. Statistical significance was tested with 1000 permutations. The null hypothesis was that of a stable population. Additionally, we calculated the statistics of genetic diversity, neutrality tests, and mismatch distributions for the main populations of Macaca mulatta in China.
Divergence times were estimated using a Bayesian MCMC method implemented in beast v.1.5.3 [57], which employed a relaxed molecular clock approach [58]. A relaxed lognormal clock model of haplogroup variation and a Yule prior for branching rates was assumed. Divergence time estimations were based on the second dataset only. After removing indels in the outgroup taxa, the final alignment for divergence age estimations comprised 2,260 nucleotide positions. The dataset was partitioned and each optimal nucleotide substitution model was selected by using the Akaike Information Criterion as implemented in Modeltest v.3.7 [50]. As a calibration point, we used the divergence time of 5.5 Ma (95% confidence interval = 4.68–6.32 Ma) [10] for matrilines between M. sylvanus and the other Asian macaques. Instead of using hardbounded calibration points, we relied on published dates and specified a normal distribution prior, with a mean of 5.5 Ma and a standard deviation of 0.5 Ma. Two replicates were run for 100 million generations while sampling trees and parameters every 1,000 generations. The adequacy of using a 50% burn-in to generate MCMC trees and for convergence of all parameters was assessed visually using Tracer v.1.4.1 [59]. Subsequently, the sampling distributions of two independent replicates were combined using the software LogCombiner v.1.5.2 [54], and the resulting 100,002 samples were summarized and visualized using TreeAnnotator v.1.5.2 [54] and Fig Tree v.1.2 [60].
All loci were collected as an EXCEL file, and then converted into the computable files using convert [61]. We used Arlequin v.3.11 to check if the microsatellite data departed from Hardy-Weinberg expectations. Tests for linkage disequilibrium were checked by using the GenePop v.4.0 [62]. The total number of alleles (NA), allelic richness (R), gene diversity (GD) and allele frequencies were calculated in Fstat v.2.9.3.2 [63]. The presence of genetic bottlenecks was tested by using the heterozygosity excess method [64] implemented in Bottleneck v.1.2.02 [65]. Following recommendations of Piry et al., we used the two-phase model (TPM) with 95% stepwise mutations and a variance of 12 [65], [66]. We used the Wilcoxon signed-rank test because this test was proven to be more powerful when less than 20 loci are used [65]. The mode-shift test implemented in Bottleneck [65] was also run.
First, we assessed population structure by using pairwise FST values calculated in Arlequin v.3.11 and the Bayesian clustering approach implemented in Structure v.2.3.3 [66], [67], [68]. For analyses using Structure, the admixture model with correlated allele frequencies was chosen and the number of genetic groups K was set from 1 to 8. Each run used five replicates, which consisted of a burn-in period of 1 000 000 MCMC steps followed by 1 000 000 MCMC steps. Plots of the mean value of the log posterior probability of the data [mean of lnP(D)] were examined to check the maximum lnP(D) value [66]. Because spatially explicit Bayesian clustering methods can be powerful when inferring genetic structure [69], [70], particularly at low levels of differentiation [71], [72], we used Geneland v.4.0.3 [72], [73], [74] to search for structure in the rhesus macaque. We specified a Kmax of 8, used the correlated allele frequency model, and allowed for the presence of null alleles. This analysis included samples that had specific locations or provincial localities. The analysis was run for 1,000,000 generations and a thinning of 1,000. To make sure the MCMC was converging and mixing properly, we used 10 independent runs and compared parameter estimates. For parameter inference we selected the run with the highest log posterior probability and specified a burnin of 250. We also used the maximum likelihood and Bayesian inference [75], [76], [77] algorithms in migrate v.3.3.2 [78] to estimate test the null hypothesis of unbiased, equal rates of migration rates and equal effective population sizes between the two largest haplogroups resolved in the haplotype network. We chose the single-step model in the analysis. For the maximum likelihood analysis, we specified three long chains while discarding the first 100,000 trees per chain. For the Bayesian inference, we employed 10 replicates while also discarding the first 100,000 trees per chain as burn-in.
|