PropertyValue
is nif:broaderContext of
nif:broaderContext
is schema:hasPart of
schema:isPartOf
nif:isString
  • The study was reviewed and approved by the Weill Cornell Internal Review Board. Patient information was anonymized and de-identified prior to analysis; individual consent was not obtained. The population is a union of health and hospital workers in the Northeast, whose self-insured trust fund administers comprehensive health benefits to the members, their spouses and children. The fund covers all medically necessary hospital, medical, maternity, behavioral health and pharmacy services, and maintains their own claims data repository. There were 226,157 individuals who were eligible for benefits for at least 11 months between January–December 2009; of those, 185,294 (86.2%) remained eligible for at least 11 months from January–December 2010. Overall 18.1% (n = 40,862) did not remain eligible for reasons such as retirement or dependent children reaching the maximum age of eligibility. This analysis focuses on the 181,764 beneficiaries who were consistently eligible for benefits over at least 22 months in 2009 and 2010, who also received DCG codes. (Sightlines DxCG Risk Solutions V 3.0, Verisk Health Inc) Beneficiary age and gender were available. Comorbidity was assessed through the Charlson comorbidity index [19]. Different weights are assigned for specific conditions and the weights are added to find the index for a specific patient (e.g., a patient with depression, COPD and lymphoma would have a weight of 4) (Table 1). Data on use of warfarin was not available. Table data removed from full text. Table identifier and caption: 10.1371/journal.pone.0112479.t001 Different weights assigned for specific conditions in the comorbidity index. The comorbidity index was assessed from claims data for services provided between January and December 2009; each claim had at least one primary ICD-9 diagnosis code and up to three secondary codes. The comorbidity index was assessed using the Deyo strategy [23]. The comorbidity index for the first year was computed for all diagnoses recorded in all of the claims during the first year. 10.2% had no claims in 2009; of those, 63% were adults and 37% children. Among adults, of those who had no claims in 2009, 53% also had none in 2010. Among children, 43% of those who had no claims in 2009 also had none in 2010. Costs were zero for those without claims. The prospective DCG model utilizes claims data and age, sex, diagnoses and their interactions to predict future costs; the 2009 prospective DCG was used to predict 2010 cost [24]. Mental health disorders encompassed the ICD-9 codes 290–316 other than depression (290.13, 296.20–296.36; 296.82; 298.0; 201.11; 301.12; 309.0–309.1 or 311) and dementia (290, 290.4–290.43, 331, 331.19, 331.2 or 331.82). Claims data was used to document the cost and utilization of services among these consistently eligible beneficiaries. The type and place of service were available. The prior year hospitalization was the actual number of hospitalizations. Total amount paid for all services including inpatient, outpatient, emergency room, laboratory tests, behavioral health and prescription drugs for January–December 2010 were evaluated. To compare the costs for those with a given chronic disease versus those with that disease and other comorbid diseases, an adjusted comorbidity index was calculated by subtracting the Charlson comorbidity weight for that disease from the total comorbidity score. For example, patients with congestive heart failure (CHF) who have an adjusted index score of zero have only CHF, while those with an index of one or more have other illnesses as well. Four predictors of subsequent yearly costs (i.e., prior year costs, prior year hospitalization, prior year comorbidity and prior year DCG) were compared using four different analytic approaches: a two part regression modeling strategy often used in econometric analysis; quantile regression to predict the upper 5% and 10% of the cost distribution; logistic regression to predict whether a specific individual would be in the upper 5% or 10% cost strata using positive predictive value; and receiver operating characteristic (ROC) analysis to evaluate sensitivity and specificity. Since cost data was skewed by both high cost patients and by those with zero costs, standard regression could not be used. Age, gender and mental health were controlled for in all regressions. There was no data on race/ethnicity in the claims; however, the beneficiaries are predominately Black and Latino. The data did not include the specific location where the beneficiaries received services. A two part mean regression framework for modeling total health care cost: The first part of the two-part model is a binary outcome model that describes the distinction between non-users (zero cost) and users of services (non-zero cost), while the second part is a linear regression that describes the distribution of total health care cost for patients who used services (see File S1) [25]. As a diagnostic to assess the functional form for current and prior year (log) cost, we fit a nonparametric estimate of the relationship using a local polynomial smoother and found that the linear term was adequate. The higher order terms added little to the model fit statistics, consequently we went with the linear term. Quantile regression was employed to assess the relationship between predictors and the upper tail of the cost distribution, controlling for age, gender and mental health diagnoses [26]; since it focuses on the upper tail, those with zero costs in the lower tail of the distribution do not heavily affect the estimates. The pseudo R2 is the measure of model fit [27]. The methodology for fitting a quantile regression model involves minimizing a weighted sum of absolute deviations. The pseudo-R2 in quantile regression is calculated as 1 – (sum of weighted deviations about estimated quantile)/(sum of weighted deviations about raw quantile) (see Koenker, R. 2005. Quantile Regression. New York: Cambridge University Press). We used this definition of pseudo-R2 for each of the models. The pseudo-R2 is analogous to the classical R2 = 1 – (error sum of squares)/(total sum of squares) from multiple regression. Positive predictive value was assessed through logistic regression which was used to build a model that predicted whether an individual would fall into the top 5% and 10% highest predicted cost groups [28]. A 50% training set was used to generate 1000 new training sets by sampling uniformly and with replacement (the bootstrap samples); models were then fit using the bootstrap samples and combined for classification into the top 5% and 10% groups. This bagged bootstrap sampling was used to generate positive predictive values [29]. Receiver operating characteristic (ROC) analysis evaluates the sensitivity and specificity at various threshold settings of prior year costs, prior year comorbidity, prior year hospitalizations and prior year DCG to predict membership in the top 5% and 10% cost groups [30]. The area under the curve (AUC) used to compare classifiers.
rdf:type