nif:isString
|
-
The Beijing-Tianjin-Hebei area is usually regarded as an economic region surrounding Beijing, Tianjin, and Hebei. This region encompasses nine cities, including Beijing, Tianjin, Baoding, Shijiazhuang, Tangshan, Cangzhou, Langfang, Zhangjiakou, and Chengde. However, in order to obtain an overview of the intercity correlation of PM2.5 series within this region, this study includes all 17 cities that are within approximately 360 km of Beijing. Therefore, this study also includes Qinghuangdao, Hengshui, Chifeng, Datong, Yangquan, Dongying, Binzhou, and Dezhou (Fig 2).
Figure data removed from full text. Figure identifier and caption: 10.1371/journal.pone.0192614.g002 Cities in the Beijing–Tianjin–Hebei region.This map was generated using ArcGIS 10.2.2 (www.esri.com). The PM2.5 measurement data from the Beijing-Tianjin-Hebei region used in this study were obtained from the national hourly air quality reporting platform (http://113.108.142.147:20035/emcpublish/) run by the China National Environment Protection Agency. These data consist of hourly concentrations of six major pollutants since early 2013: particulate matter with aerodynamic diameters no greater than 2.5 microns (PM2.5), particulate matter with aerodynamic diameters less than 10 microns (PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO). However, these data are not easily accessible, because the online reporting system only lists air quality information for the current day and historical data are unavailable to the public. Fortunately, third parties such as AQISTUDY.cn (https://www.aqistudy.cn/) and EPMAP.org (http://epmap.org) have been crawling these data from national hourly air quality reporting platform since late 2013. This study used air quality monitoring data from 1 January 2014 to 31 December 2014 that were obtained by AQISTUDY.cn and EPMAP.org. There were missing hourly measurements in both data sources; therefore, we combined the two datasets to obtain a more complete 24-h PM2.5 measurement dataset for each day in 2014. A comprehensive quality check of the raw data was conducted to reduce the impact of problematic data points, including duplicated data records, missing measurements with placeholders, and implausible zeros. In particular, data points with extremely high PM2.5 concentrations (>1000 μg/m3) were considered problematic outliers, and therefore, such data points were removed from the analysis. Following the quality check, the air quality monitoring data were processed to facilitate the city-based cross-correlation analysis. First, for each city, the hourly PM2.5 concentration was calculated by averaging the hourly data from all stations within that city. Then, a linear interpolation was used to fill missing data points. Finally, a simple 8-h moving average was calculated to better capture the general trend of the PM2.5 time series and to reduce the impact of noise. Wind vector data were used to help verify the effects of synoptic meteorological conditions on PM2.5 pollution. This study adopted the Modern-Era Retrospective Analysis for Research and Applications version 2 dataset (MERRA-2) produced by NASA. This dataset is an atmospheric reanalysis dataset from NASA that uses the Goddard Earth Observing System Model (version 5), which is based on atmospheric, land, and ocean observations from satellites, aircraft, and ships [19]. Specifically, this study used a monthly averaged atmospheric diagnosis product, i.e., M2TMNXSLV version 5.12.4 [20]. The key variables in this dataset consisted of eastward and northward wind speeds at 50 m above the surface, from which wind speed and wind direction were computed. These data are available from NASA Modern-Era Retrospective Analysis for Research and Applications version 2 dataset (MERRA-2) website (https://disc.gsfc.nasa.gov/datareleases/merra_2_data_release).
This study used a visualization framework to visualize the underlying dynamic interactions of PM2.5 time series in different cities based on the ground-based air quality monitoring data. The main component of the visualization framework is the cross-correlation method, which calculates intercity correlations between PM2.5 time series in different cities. In the following sections, this paper will introduce the cross-correlation method, the threshold guidance and significance test for the coefficients, and the implementation and presentation of the results. The cross-correlation method is a technique used in the field of signal processing to measure the similarity of two time series as a function of the lag of one relative to the other [21]. This technique is simple, but it has various applications including speech recognition, microphone-array processing [22], and even genetic studies [23]. This study used the cross-correlation method to determine the time delay between two PM2.5 time series. The calculation process consists of two principal steps: the calculation of the correlation coefficients between two time series at different time lags, and the selection of the time lag when the maximum correlation coefficient is reached. This maximum correlation coefficient occurs at the time shift for which the two time series are best aligned. The calculation process can be expressed using the following equations: R(τ)=Corr(S1(t),S2(t−τ)),(1) Tdelay=argmaxτ(R(τ)),(2) where S1 and S2 are the two time series to be computed, τ is the time lag, R(τ) is the correlation coefficient calculated between S1 and S2 when the time lag is τ, argmaxτ refers to the argument (in this case, the time lag τ) at which the values of the function R(τ) are maximized. Tdelay denotes the time lag that generates the maximum correlation coefficient Rmax. To illustrate the cross-correlation analysis, this study used the Beijing and Qinghuangdao PM2.5 time series in January 2014 as an example. First, the correlation coefficients were calculated at different time lags; then, the maximum correlation coefficient was identified; and finally, the time lag that created the maximum correlation coefficient was determined. Fig 1B shows the maximum correlation coefficient occurs when the PM2.5 time series from Qinhuangdao is shifted to the left by approximately 4 h. Therefore, the time lag that attained the best alignment and created the maximum correlation coefficient was determined as 4 h (Fig 1B). As seen from Fig 1A, the best alignment between the two time series was obtained by shifting the Qinhuangdao PM2.5 time series to the left by approximately 4 h, which is consistent with the calculation from the cross-correlation method.
Threshold guidance and significance testing of coefficients: The maximum correlation coefficient in the example above is 0.697 (Fig 2B), which suggests a reasonably strong correlation between the PM2.5 time series in Beijing and Qinhuangdao. However, not all of the maximum correlation coefficients for PM2.5 time series between each pair of cities attained such a desirable correlation. In some cases, the correlation of the PM2.5 time series was very low, which indicated that the PM2.5 time series were not correlated at all. Therefore, correlation coefficient thresholds were needed to distinguish relationships that were correlated from those that were not. This study used a rule of thumb proposed by Dennis Hinkle and his coauthors [24] to interpret correlation coefficients. Specifically, correlation coefficients of 0.7–0.9 and 0.9–1.0 were considered high and very high correlations, respectively. A coefficient of 0.5–0.7 was considered a moderate correlation. Coefficients of 0.3–0.5 and 0.0–0.3 were regarded as having a low or little correlation, respectively. In this study, correlation coefficients >0.5 were considered indicative of a probable correlated relationship and coefficients <0.5 were regarded as uncorrelated relationships. As shown above, time lags for each month were determined by computing the maximum correlation coefficient, Rmax, based on the ground-based air quality monitoring data in 2014. Tests of significance were needed to examine whether the maximum correlation coefficient, Rmax, was significantly larger than the correlation coefficient without the time lag. The correlation coefficient without the time lag is denoted as R(0) here. A value of Rmax that was significantly larger than R(0) indicates that the difference between the two coefficients is not due to random chance and that it is safe to use the time lag, Tdelay, and the maximum correlation coefficients for further analysis. To test the difference between two correlations, the correlations were transformed using Fisher’s r-to-z transformation [25]. Details about this transformation and calculation are illustrated in [26].
The cross-correlation method and the significance tests for coefficients are implemented using Python 2.7.5 (https://www.python.org). All figures were drawn using Python 2.7.5 and Matplotlib 1.5.0 (https://matplotlib.org/). To provide a clear and intuitive presentation on the interactions between PM2.5 time series in different cities, this study employed two different forms to display the results. One is a map presentation, and the other is a matrix presentation. The map presentation offers a geographic representation of the dynamic relationships between PM2.5 time series in different cities, while the matrix presentation provides a tabulation of these associations.
|