PropertyValue
is nif:broaderContext of
nif:broaderContext
is schema:hasPart of
schema:isPartOf
nif:isString
  • The study area used is the BTH urban agglomeration, which is also called the Greater Beijing region (Fig 1). This area is located in Northeast China, at longitude 113°04' to 119°53' east and 36°01' to 42°37 ' north. It measures 218,000 km2 and had more than 100 million residents as of 2016 (National Bureau of Statistics of China, 2016). The BTH urban agglomeration is the largest urban agglomeration and the most developed economic centre in northern China. Beijing is the political capital, cultural and information centre of China and is one of the largest megacities worldwide, with more than 21 million people and 5.7 million vehicles in 2016 [55]. Given the importance of the Greater Beijing region, severe air pollution has been the leading environmental challenge, with frequent occurrences of fog and haze. Related statistical consensus indicate that the total annual mean values of Beijing’s PM10 and PM2.5 concentrations from 2012 to 2015were 138.5 ± 92.9 μg/m3 and 2.3 ± 54.4 μg/m3, respectively [37]. Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0201364.g001 Geographic information of the BTH urban agglomeration. In terms of urban distribution and prefectural boundary, prefectural boundary layers at a scale of 1:4,000,000 are obtained from the National Geomatics Centre of China. PM2.5 concentration data in the BTH urban agglomeration from December 2013 to May 2017 cover 12 cities, namely, Beijing, Tianjin, Tangshan, Zhangjiakou, Baoding, Handan, Chengde, Qinhuangdao, Xingtai, Cangzhou, Langfang and Shijiazhuang (For details, see Fig 2 and S2 Table). Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0201364.g002 Geographic allocation of the 12 cities in the BTH urban agglomeration. This monitoring dataset is obtained from the atmospheric physics sites of the 12 cities by the Ministry of Environmental Protection of China. The remote sensing data from the Atmospheric Composition Analysis Group (2016) [56] are initially used in BTH region. Compared with the remote sensing data on PM2.5 (Atmospheric Composition Analysis Group, 2016) [56], spatial interpolation has higher accuracy than remote sensing data in reflecting PM2.5 concentrations at near-ground level in this study. These data are measured by 80 monitoring stations distributed throughout the BTH region (Fig 3). Each monitoring station automatically measures monthly PM2.5 concentrations. Cangzhou, which is a small city, has three air quality monitoring stations. Other cities have more than five air quality monitoring stations that are distributed from the suburbs to downtown. The annual average of PM2.5 concentration for each monitoring station is calculated based on the hourly real-time data. Beijing, Tianjin and Hebei have 12, 15 and 53 atmospheric physics observation points, respectively. Table 1 describes the geographic information on several atmospheric physics observation points. The geographic information of all observation points is shown in S1 Table. In terms of data standardisation, the collected PM2.5 values at 2:00, 8:00, 14:00 and 20:00 from different observation points are averaged to derive the daily and monthly PM2.5 concentrations of a city. The PM2.5 concentration of 12 sites is the average of 80 sites. Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0201364.g003 Geographic allocation of 80 atmospheric physics observation points. Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0201364.t001 Geographic information on some atmospheric physics observation points. Source: China Meteorological Administration Table 2 describes an array of PM2.5-related variables summarised from an extensive literature review. These variables are tested to identify the critical impact factors for PM2.5 concentration. Existing studies on identifying the key contributing factors of PM2.5 concentration in China have mainly focused on demographic and economic aspects and other chemical air pollutants. Yang and Chen (2017) [57] used independent variables, namely, coal consumption, cement production, automobile volume, population and gross domestic product (GDP). Lu et al. (2017) [58] incorporated the following variables in their estimation, namely, population density, annual volume of bus passengers, road freight, proportion of secondary industry to overall GDP, volume of SO2 emissions and volume of industrial soot emission. Ma and Xiao (2017) [59] considered urbanisation, energy consumption structure[60] and construction areas in their investigation. On the basis of an extensive literature review, 12 potential contributing factors for PM2.5 concentrations are identified (Table 2). Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0201364.t002 Potential critical impact factors for PM2.5 concentration. The possible critical impact factors of PM2.5 concentration are selected (Tables 2 and 3), discussed and included in the estimation model. Since independent variable data in 2016 and 2017 has yet been published by the statistic consensus, this study used the latest data of 2015 in the PLS model to analyse the critical factors for PM2.5 concentration (For details, see S3 Table). Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0201364.t003 Statistic description (2015). Data Resource: National Bureau of Statistics of the People’s Republic of China (2016 a, b) A comparison of PM2.5 concentrations with population reveals interesting findings that should be considered. Shijiazhuang, which is the most polluted city, features a relatively high population density of 788.14 persons/km2. Handan, the second most polluted city, has the highest population density of 870.29 persons/km2. Zhangjiakou, which has the best air quality, presents a low population density of 127.46 persons/km2. Chengde, with a similar pollution level as Shijiazhuang, also manifests a low population density of 96.73 persons/km2 (Fig 4). A common pattern exists in which population density forms a certain positive relationship with PM2.5 concentration. However, this pattern is affected by various critical impact factors that lead to certain variations. Therefore, a detailed investigation on critical PM2.5 factors should be conducted for a thorough analysis of such particles. Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0201364.g004 PM2.5 concentration and population density per km2 in 2015. (Unit: μg/m3 refers to the left axis, and Persons/km2 refers to the right axis). The estimation results reveal that PM2.5 concentration in the BTH urban agglomeration exhibits a distinctive spatial distribution characteristic. Related literature shows that coal combustion accounts for 20%–30% of PM2.5 pollution in Chinese cities [20, 22, 27, 28]. In winter, PM2.5 concentration is usually high because coal is used as the main energy for winter heating. In summer, the situation significantly differs in the BTH urban agglomeration. For example, motor vehicles account for 63% of the carbonaceous components of PM2.5 in Beijing, while coal combustion accounts for 30.3% of PM2.5 compositions because it is used as the major energy source for industrial production in the city [22]. Therefore, the present study uses industrial dust and industrial SO2 emissions as parameters to investigate the air pollution contributions of heating and industrial development. Tangshan shows the highest volumes of industrial SO2 emission, which amounted to 214,723 t in 2016, followed by Shijiazhuang and Handan with 113,652 and 110,193 t, respectively. Xingtai and Handan represent the top two contributors of industrial dust emissions, accounting for 191,713 and 100,738 t, respectively. Fig 5 shows the volume of industrial soot (dust) and sulphur dioxide emissions of 12 cities in BTH. Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0201364.g005 Volume of industrial soot (dust) and sulphur dioxide emissions in 2015 (Unit: Ton). Several studies argue that transportation exerts a significant adverse influence on air pollution [29]. On the basis of available data from statistical consensus, ‘passenger and freight volume of highway traffic’ are used as a parameter for measuring PM2.5 pollution from transportation. Data suggest that polluted cities are generally associated with high road freight volume. For example, Shijiazhuang, Tangshan and Handan, which are the most polluted cities, are associated with relatively high freight volumes with 3,695,410,000, 363,580,000 and 387,040,000 t, respectively. The research framework and main research steps are illustrated in Fig 6. Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0201364.g006 Research framework. PM2.5 concentration is a scalar description of atmospheric state significantly affected by local human activities. Although remote sensing has been improved by techniques such as regional correlations in recent years, several studies indicate that spatial interpolation is a powerful approach to replace the inversion method, leading to higher accuracy than remote sensing data in reflecting PM2.5 concentrations at near-ground level [40, 54, 64–67]. To address this limitation, spatial interpolation is employed and the results of the inversion method are considered as references. Interpolation methods used in regional-scale factors include inverse distance interpolation (IDW) and Kriging interpolation method (OKM). OKM is a more widely recognised method for dealing with interpolation points than IDW [22]. This study uses OKM to simulate seasonal variations of PM2.5 in the 12 cities of the BTH urban agglomeration. The supporting concept of OKM is that the interpolation results at the target point are the weighted sum of known attribute values of the samples [68]. In the study area, x represents the spatial location of point x. z(xi) (i = 1, 2, ⋯, n) represents the property value of sampling point xi (i = 1, 2, ⋯, n), and annual mean PM2.5 concentration is the property value of point xi. Then, the interpolation result at target point x0 is z(x0): z(x0)=∑i=1nλiz(xi). Where λi (i = 1, 2, ⋯, n) depends on undetermined coefficients. Assuming that the entire study area satisfies the second-order stationary assumption, that is, ‘the mathematical expectation of z(x) exists and is equal to the constant, that is, E[z(x)] = m’, the covariance function of variables z(x) exists and only depends on lag value (h), that is, Cov[z(x),z(x +h)] = E[z(x)z(x + h)] − m2 = C(h). On the basis of unbiased expectation E[z*(x0)] = E[z(x0)], E[z*(xi)] refers to the spatial variation of PM2.5 concentration in BTH by OKM in point xi, E[z*(x0)] denotes the spatial variation of PM2.5 concentration in BTH by OKM in point x0, and z(x0) is the PM2.5 concentration in point x0. We can conclude that ∑i=1nλi=1. For regionalised variables that satisfy the second-order stationary conditions, the estimated variance can be calculated using the following formula: σE2=E[z*(x0)−z(x0)]2−{E[z*(x0)−z(x0)]}2=∑i=1n∑j=1nλiλjCi,j−2∑i=1nλiCi,0+C0,0. To obtain the minimum variance estimation under unbiased conditions, that is, Min{Var[z*(x0)−z(x0)]−2μ∑i=1n(λi−1)}. The weight coefficients should satisfy the following equations: {∑i=1nλiCov(xi,xj)+μ=Cov(x0,xi)∑i=1nλi=1. Then, we can calculate the value of λi (i = 1, 2, ⋯, n) and obtain the attribute value z*(x0) at sample point x0. The degree of correlation between t1 and u1 should be the maximum.The two conditions can be summarised as follows: Var(t1)→max Var(u1)→max Var(t1,u1)→max After the first principal components t1 and u1 are extracted from X and Y, PLS performs linear regressions of X and Y on t1. In the PLS estimation, components t1 and u1 have typical component characteristics. A significant linear relationship between t1 and u1 indicates that X has a notable correlation with Y, and PLS is appropriate for estimating the contribution of X to Y. The algorithm is terminated when the regression equations reach satisfactory levels. Otherwise, the residuals of X and Y after regression on t1 are used to extract the next principal component. The algorithm iterates until the results reach satisfactory levels. Cross-validation (Qh2) is used as the measurement criterion to determine whether the regression results reach the satisfactory level. For the number of extracted principal components h, rounding observation i (i = 1,2, ⋯, n) for each time, (i = 1,2, ⋯, n), the PLS model is built with the remaining (n−1) observations. Then, observation i is substituted in the fitted PLS regression equation to obtain the predicted value of yj (j = 1,2, ⋯, q) at observation i, and the predicted value is recorded as y(i)j^(h). The above calculation is repeated for each i (i = 1,2, ⋯, n). The sum of the squared errors (SSE) for dependent yj is obtained when h principal components are extracted and PRESSj(h)=∑i=1n(yij−y(i)j^(h))2 is recorded. Then, the sum of SSE for Y = [y1, y2, ⋯, yq] is obtained and PRESS(h)=∑i=1qPRESSj(h) is recorded. All observations are likewise used to fit the regression equation with h principal components. At this time, the prediction value for observation i is noted as y(i)j^(h). The sum of SSE for yj is defined as SSj(h)=∑i=1n(yij−y(i)j^(h))2, and the sum of SSE for Y is defined as SS(h)=∑j=1qSSj(h). Cross-validation is defined as Qh2 = 1 − PRESS(h)/SS(h − 1) Thus, a cross-validation test is performed before the end of each modelling step. The model estimation reaches a satisfactory level of precision and the extraction of components is stopped if Qh2 < 1 − 0.952 = 0.0975 is satisfied at step h. If Qh2 ≥ 0.0975 is satisfied at step h, then the marginal contribution of the extracted principal component th is significant, and step (h+1) should be calculated. After m principal components t1, t2, ⋯, tm are finally extracted from X, PLS first performs a regression of yk on t1, t2, ⋯, tm and converts it in the regression equation of yk on x1, x2, ⋯, xp. The specific procedures of the PLS algorithm are summarised as follows: Step 1. To simplify the calculation and eliminate the effects of different units of variables, this study first standardises the original data matrices (X and Y), which are denoted by E0 and F0.Step 2. Let t1 be the first principal component extracted from E0. The regression of E0 and F0 on t1 is performed as follows:E0=t1p1′+E1,F0=t1r1′+F1.Where p1 and r1 refer to regression coefficient vectors, and E1 and F1 represent the corresponding residual matrices. The accuracy of the regression equation is calculated. The algorithm is terminated when the regression equations reach satisfactory levels. Otherwise, let E0 = E1 and F0 = F1, and iterate the component extraction and regression analysis. Cross validation (Qh2) is used to evaluate the model until the expected accuracy is obtained. Step3. The number of regression components should be selected. The number of regression components included in the PLS model is important because it directly affects the fitting accuracy of the model. It should be carefully selected based on cross validation (Qh2). If Qh2 is higher or equal to 0.0975, then the marginal contribution of component th is significant and contributes to the precision of estimation results.Step4. The regression equation of E0 and F0 on t1, t2, ⋯, tm is derived if the model extracts m principal components. The following regression equation is developed through inverse transformation.In the calculation of PLS, the principle component th should both represent the variation information of X (xj (j = 1, 2, …, p)) and explain the information of Y (yk (k = 1, 2, ⋯, q)) as much as possible. To measure the explanatory power of th for interpreting X and Y, we define various explanatory powers of th as follows: The explanatory power of th to interpret X: Rd(X;th)=1p∑j=1pRd(xj;th);The cumulative explanatory power of t1, t2, ⋯, tm to interpret X: Rd(X;t1,tm)=∑h=1mRd(X;th);The explanatory power of th to interpret Y: Rd(Y;th)=1q∑k=1qRd(yk;th);The cumulative explanatory power of t1, t2, ⋯, tm to interpret Y: Rd(Y;t1,tm)=∑h=1mRd(Y;th).A significant advantage of PLS regression is the reliable choice of variables. When independent variable xj is used to explain the set of dependent variables Y, the variable importance in projection VIPj can be used to measure the importance of xj in interpreting Y [69]. The expression of VIPj is VIPj=pRd(Y;t1,tm)∑h=1mRd(Y;th)w2hj, where p represents the number of independent variables, and whj is the linear combination coefficient of the principal component th. For principle component th, th = wh1x1+wh2x2+⋯+whpxp. For h = 1,2, ⋯,m, ∑j=1pw2hj=1. The explanatory power of xj to Y is transferred by th. Formula VIPj2=p∑h=1mRd(Y;th)w2hj∑h=1mRd(Y;th) indicates that when the values of Rd(Y;th) and w2hj are large, VIP2j will also gain a large value. Formula ∑j=1pVIPj2=∑j=1pp∑h=1mRd(Y;th)w2hj∑h=1mRd(Y;th)=p∑h=1mRd(Y;th)∑h=1mRd(Y;th)∑j=1pw2hj=p indicates that if the VIPj of all independent variables xj(j = 1,2, …,p) equals 1, then they all play the same role in interpreting Y. Otherwise, xj exerts a significant effect on interpreting Y when VIPj> 1.
rdf:type