Abstract

Background

A previously developed quantitative structure–activity relationship (QSAR) model has been extern ally validated as a good predictor of chemical asthma hazard (sensitivity: 79–86%, specificity: 93–99%).

Aims

To develop and validate a second version of this model.

Methods

Learning dataset asthmagenic chemicals with molecular weight (MW) <1kDa were identified from reports published in the peer-reviewed literature before the end of 2012. Control chemicals for which no reported case(s) of occupational asthma had been identified were selected at random from UK and US occupational exposure limit tables. MW banding was used in an attempt to categorically match the control group for MW distribution of the asthmagens. About 10% of chemicals in each MW category were excluded for use as an external validation set. An independent researcher utilized a logistic regression approach to compare the molecular descriptors present in asthmagens and controls. The resulting equation generated a hazard index (HI), with a value between zero and one, as an estimate of the probability that the chemical had asthmagenic potential. The HI was determined for each compound in the external validation set.

Results

The model development sets comprised 99 chemical asthmagens and 204 controls. The external validation showed that using a cut-point HI of 0.39, 9/10 asthmagenic (sensitivity: 90%) and 23/24 non-asthmagenic (specificity: 96%) compounds were correctly predicted. The new QSAR model showed a better receiver operating characteristic plot than the original.

Conclusions

QSAR refinement by iteration has resulted in an improved model for the prediction of chemical asthma hazard.

Introduction

Although new causes of occupational asthma (OA) continue to be identified through systematic health surveillance, as well as through patients presenting symptomatically, the challenge remains to predict asthmagens and therefore protect more workers and prevent ill-health as early as possible. New causes of OA are frequently low-molecular-weight (LMW) chemicals [1] but traditional toxicological methods have failed to lead to the development of a protocol fit for the purpose of screening LMW industrial chemicals for respiratory sensitization hazard [2,3]. Computer-based structure–activity relationship (SAR) models offer an efficient and cost effective means of assessing whether a LMW chemical has the potential to cause asthma in humans [4–7].

For a chemical asthmagen to initiate the cascade of molecular events leading to the pathophysiological changes that manifest as asthma, the chemical is thought to react with an amino acid side chain of a native protein molecule present in the respiratory tract [8,9]. Chemists are able to describe the reaction between proteins and a putative asthmagen (haptenization), hypothesizing that a key determinant is the electrophilic index of a compound [10]. They are, however, less able to make predictions about immunogenicity. For example, if the chemical is presented on the surface of the protein, it might be a component of the immunogenic epitope. On the other hand, if the chemical asthmagen is shielded within the folds of the protein, an immunogenic epitope might only be developed if the chemical has two reactive groups that cause cross-linking and changes to the tertiary protein structure [11].

These aspects may be better addressed by a statistical or quantitative SAR (QSAR), which makes no a priori assumptions about the mechanism of chemical reactivity and hence biological activity. An existing QSAR model for predicting chemical asthma hazard had been developed by us [6], comparing molecular descriptors of 78 chemicals reported to cause OA in humans prior to 1995 with those of 301 chemicals to which humans have had inhalational exposure in the workplace but for which OA had not been reported. External validations of this model demonstrated its ability to differentiate chemicals with and without asthma hazard with sensitivity 79–86% and specificity 93–99% [6,12]. Clinicians have already used this model, which is freely available on the internet [13], to provide confirmatory evidence that they have correctly attributed a case of OA to a previously unrecognized chemical respiratory sensitizer [14–16]. It also offers a means of helping clinicians to pick out the most likely candidate novel asthmagen(s) from a list of chemicals, to which a worker with possible OA has been exposed, in order to guide the choice of further investigations such as specific bronchial challenge testing. However, an iteration of this model, using updated learning datasets (defined as the sets of control and asthmagenic compounds selected for model development; the model ‘learns’ how to distinguish between compounds with and without asthmagenic potential by comparison of the chemical substructural fragments present in the control and asthmagen learning dataset compounds), which include more recently published novel chemical asthmagens, might be expected to lead to improved predictive performance. Furthermore, advances in computing hardware, software and development tools allow a faster test and validation cycle of the underlying algorithms.

This study aimed to develop and externally validate a second version of a quantitative structure activity relationship (QSAR) model for predicting asthma hazard of LMW organic compounds.

Methods

The asthmagenic chemicals selected for the ‘active’ group of the QSAR learning dataset were taken from reports (single cases, case series or epidemiologic studies) identified from a literature search as described by Jarvis et al. [6]. Papers written in English and published before the end of December 2012 were included.

Criteria for inclusion in the asthmagen or ‘active’ group were:

  • 1. The case(s) of OA had been attributed to an organic compound with a high degree of certainty by the reporting physician.

  • 2. The mechanism of asthma was thought to be sensit ization and not irritation.

  • 3. The causative chemical had a molecular weight (MW) <1000Da.

  • 4. The LMW organic compound could be identified unambiguously from the name or CAS Registry Number (CAS RN) given in the case report using one of the databases mentioned below.

The MW of each compound included in the active group was determined using either the Merck Index [17] or one of several chemical databases on the internet [18–21]. Chemicals within the learning dataset asthmagens were grouped according to MW into three categories: 0–149, 150–299 and 300Da or greater. These categories were selected because they divided the learning asthmagen set into three groups of approximately equal number whilst potentially maximizing the number of controls that could be selected for the upper MW category.

The method used for identifying control compounds was similar to that used by Jarvis et al. in the development of the first asthma QSAR prediction model [6]. This utilized Workplace Exposure Limit (WEL) tables in order to identify chemicals that are used widely in industry but have never been reported to cause asthma. However, one important difference in the control selection for this second version was that an attempt was made to match the MW distribution by category (as above) with the asthmagens.

A large pool of control compounds was therefore created by selecting all those listed in either the UK Health & Safety Executive (HSE) WEL tables [22] and the equivalent tables of the American Conference of Governmental Industrial Hygienists (ACGIH) [23] that met the following criteria:

  • 1. It had been assigned a long-term exposure limit (8h time-weighted average).

  • 2. It was a defined LMW organic compound (criteria same as for asthmagens) and was listed with a CAS RN.

  • 3. It had not featured in either the list of asthmagenic compounds identified in this study or in the list of asthmagenic compounds used in the first model developed by Jarvis et al. [6].

  • 4. It had not been assigned the risk label R42 ‘may cause sensitization by inhalation’ or the equivalent labels ‘SEN’ or ‘RSEN’ used by ACGIH.

As many controls as were available from these tables were randomly selected within each of the three MW categories, up to a maximum number that was three times the number of asthmagens in each MW band.

Ten per cent of chemicals within each MW band of both the asthmagen and control sets were randomly selected for exclusion from the final learning datasets for subsequent external validation of the resulting model. Following removal of 10% of compounds from each of the sets at random, the final asthmagen and control sets were merged into a single list of compounds that was sent to an independent researcher such that he was able to identify the chemicals’ structures initially blind to whether or not a given chemical was an asthmagen.

The independent researcher who was the model developer (J.J.) scrutinized the chemical identities and each learning set compound name was checked for correct structure. Duplicates (identical structures using different names) were removed. Any named compound for which an identifiable structure could not be verified was flagged for removal.

The original program source codes were committed to a secure academic code repository then reviewed and refactored to ensure consistent use of methodology across the software suite. The code was tested and updated to handle newer software used to create the input MDL Molfiles, an alphanumeric textual representation of a chemical structure. Code was then revalidated to check that the library of chemical substructure fragments was producing results that matched manual assessment.

The full learning set compounds were then characterized. For each compound, the frequency of occurrence of a fragment library—a collection of chemical substructures whose association with the hazard is being investigated—was computed. The fragments were tabulated against each learning set compound along with whether the chemical was from the control or asthmagen list within the learning dataset.

A logistic regression methodology (using IBM SPSS Statistics Version 20.0.0 on an Intel Apple Mac) was used to model asthmagen activity against fragment occurrence frequencies. Fragment occurrence frequency variables which were deemed significant from the model creation were entered into the predictive model.

The model developed by the independent researcher (J.J.) was sent to the researchers assigned to the validation (M.J.S., S.J.S.) as a precompiled Microsoft Windows executable file. The two validators set up the program independently on individual computers. The 10% of chemicals that had been removed from the original control and asthmagen sets for the purpose of external validation were merged into a single list by the first validator (M.J.S.) so that second validator (S.J.S.) was blind to whether each chemical was a control or asthmagen. A web-based chemical database was used to provide chemical structures in molfile format that could be entered into the model [18]. Where it was not possible to obtain a molfile directly from this database for a given compound, the first validator (M.J.S.) drew the structure using Chemdraw software which allowed conversion to a molfile format that was accepted by the QSAR program. For each compound in the external validation set, both validators recorded the hazard index (HI), a value between zero and one generated from the logistic regression equation embedded within the computer program. The greater the predicted likelihood of a given test compound being asthmagenic the greater the HI value generated from this mathematical equation linked to its molecular descriptors.

An indication of the model’s global predictive per formance across the range of hazard indices and the optimal cut-point HI value for a ‘general diagnostic’ or discriminatory test was obtained from a receiver operating characteristic (ROC) plot [24].

Ethical approval was not required for this study as the only data used were existing human and toxicological data that were already in the public domain.

Results

After removal of 10% of compounds in each MW category of the control and asthmagen groups for subsequent use as the validation set, the merged list of learning dataset asthmagens and controls that was sent to the model developer (J.J.) contained 319 compounds. On closer scrutiny of the structures of these 319 compounds, the model developer, who was working independently, had to exclude 16 compounds all of which were controls. Four of these were excluded because they also appeared in the asthmagen set, two were metallo-complex compounds which were difficult structures to represent covalently and one (decaborane) did not contain any carbon atoms. The remaining chemicals which had to be excluded were duplicates within the control learning dataset, most of which had been included twice because of differing nomenclatures used by the ACGIH and by the HSE in EH40.

A total of 303 compounds with their fragment occurrence frequencies and their activity (0 = control or 1 = asthmagen) were used to generate the statistical predictive model. This set of 303 compounds comprised 204 controls and 99 sensitizers. Their distribution into each of the three MW categories as well as that of the external validation set is shown in Table 1.

Table 1.

Numbers of chemicals in learning and validation datasets split into three MW bands

MW band (Da)Number of asthmagens in learning datasetNumber of controls in learning datasetNumber of asthmagens removed for external validationNumber of controls removed for external validation
0–149277639
150–299328439
300+404446
Total992041024
MW band (Da)Number of asthmagens in learning datasetNumber of controls in learning datasetNumber of asthmagens removed for external validationNumber of controls removed for external validation
0–149277639
150–299328439
300+404446
Total992041024
Table 1.

Numbers of chemicals in learning and validation datasets split into three MW bands

MW band (Da)Number of asthmagens in learning datasetNumber of controls in learning datasetNumber of asthmagens removed for external validationNumber of controls removed for external validation
0–149277639
150–299328439
300+404446
Total992041024
MW band (Da)Number of asthmagens in learning datasetNumber of controls in learning datasetNumber of asthmagens removed for external validationNumber of controls removed for external validation
0–149277639
150–299328439
300+404446
Total992041024

Backward Stepwise (Likelihood Ratio) and Backward Stepwise (Conditional) logistic regression models produced identical beta coefficients for variables. Table 2 shows SPSS output for the Backward Stepwise (Conditional) model that was used to determine the predictive model coefficients.

Table 2.

SPSS output for Backward Stepwise (Conditional) model that was used to determine the predictive model coefficient

Fragment (description)BSEWalddfSignificanceExp(B)
Ar_O (oxygen attached to aromatic ring)0.640.256.3510.011.89
P−1.631.102.2010.140.20
S−0.710.383.6010.060.49
Ar_S (sulphur attached to aromatic ring)2.310.886.9810.0110.1
Br−19.327319.340.0011.000.00
Amine1.510.3320.8910.004.54
Ar_amine (amine attached to aromatic ring)−1.390.605.4210.022.49
Carboxyl1.110.446.5010.013.04
Aldehyde−22.5340192.980.0011.000.00
Ketone23.7540192.980.0011.002.06×1010
X_dbl_X (any two atoms joined by a double bone)0.500.1314.5410.001.65
C_dbl_N (any nitrogen double bond to a carbon)1.060.2813.9810.002.89
Acrylate21.1910140.520.0011.001.60×109
Anhydride22.9416355.040.0011.009.15×109
EtOHAmine (ethanolamine)1.780.726.1110.015.93
Constant−3.050.3670.0910.000.05
Fragment (description)BSEWalddfSignificanceExp(B)
Ar_O (oxygen attached to aromatic ring)0.640.256.3510.011.89
P−1.631.102.2010.140.20
S−0.710.383.6010.060.49
Ar_S (sulphur attached to aromatic ring)2.310.886.9810.0110.1
Br−19.327319.340.0011.000.00
Amine1.510.3320.8910.004.54
Ar_amine (amine attached to aromatic ring)−1.390.605.4210.022.49
Carboxyl1.110.446.5010.013.04
Aldehyde−22.5340192.980.0011.000.00
Ketone23.7540192.980.0011.002.06×1010
X_dbl_X (any two atoms joined by a double bone)0.500.1314.5410.001.65
C_dbl_N (any nitrogen double bond to a carbon)1.060.2813.9810.002.89
Acrylate21.1910140.520.0011.001.60×109
Anhydride22.9416355.040.0011.009.15×109
EtOHAmine (ethanolamine)1.780.726.1110.015.93
Constant−3.050.3670.0910.000.05
Table 2.

SPSS output for Backward Stepwise (Conditional) model that was used to determine the predictive model coefficient

Fragment (description)BSEWalddfSignificanceExp(B)
Ar_O (oxygen attached to aromatic ring)0.640.256.3510.011.89
P−1.631.102.2010.140.20
S−0.710.383.6010.060.49
Ar_S (sulphur attached to aromatic ring)2.310.886.9810.0110.1
Br−19.327319.340.0011.000.00
Amine1.510.3320.8910.004.54
Ar_amine (amine attached to aromatic ring)−1.390.605.4210.022.49
Carboxyl1.110.446.5010.013.04
Aldehyde−22.5340192.980.0011.000.00
Ketone23.7540192.980.0011.002.06×1010
X_dbl_X (any two atoms joined by a double bone)0.500.1314.5410.001.65
C_dbl_N (any nitrogen double bond to a carbon)1.060.2813.9810.002.89
Acrylate21.1910140.520.0011.001.60×109
Anhydride22.9416355.040.0011.009.15×109
EtOHAmine (ethanolamine)1.780.726.1110.015.93
Constant−3.050.3670.0910.000.05
Fragment (description)BSEWalddfSignificanceExp(B)
Ar_O (oxygen attached to aromatic ring)0.640.256.3510.011.89
P−1.631.102.2010.140.20
S−0.710.383.6010.060.49
Ar_S (sulphur attached to aromatic ring)2.310.886.9810.0110.1
Br−19.327319.340.0011.000.00
Amine1.510.3320.8910.004.54
Ar_amine (amine attached to aromatic ring)−1.390.605.4210.022.49
Carboxyl1.110.446.5010.013.04
Aldehyde−22.5340192.980.0011.000.00
Ketone23.7540192.980.0011.002.06×1010
X_dbl_X (any two atoms joined by a double bone)0.500.1314.5410.001.65
C_dbl_N (any nitrogen double bond to a carbon)1.060.2813.9810.002.89
Acrylate21.1910140.520.0011.001.60×109
Anhydride22.9416355.040.0011.009.15×109
EtOHAmine (ethanolamine)1.780.726.1110.015.93
Constant−3.050.3670.0910.000.05

The external validation set comprised 10 asthmagens and 24 controls. Both validators were independently able to download molfiles for 32 of the 34 validation compounds. For the two compounds for which this was not possible, one of the validators (M.J.S.) identified the structure from another source [17,19–21] and created molfiles for them using Chemdraw following which these two molfiles were made available to the other validator (S.J.S.). The same HI results were obtained by both validators working independently for 33 of the 34 compounds. For one compound, a HI inconsistency was identified on reconciliation between the two validators and then corrected.

After this reconciliation, both validators were in complete agreement about the HI results which are listed in Table 3 (asthmagens) and Table 4 (controls). Figure 1 shows the distribution of HI results for the external validation sets of controls and asthmagens. As previously described, the closer the HI value is to one the greater the probability that its molecular features render it asthmagenic as determined by the logistic regression analysis comparing chemical structures of asthmagens and controls.

Table 3.

External validation asthmagens

Chemical nameCAS RNMWHI 2014 model
Plicatic acid16462-65-04220.85
Cefadroxil66592-87-83811
Ampicillin69-52-33490.99
Captafol2939-80-23490.1
Trimethylolpropane triacrylate15625-89-52960.49
Tetrachlorophthalic anhydride117-08-82861
Trimellitic anhydride552-30-71921
Penicillamine52-67-51490.76
Methyl-2-cyanoacrylate137-05-31111
Dimethylethanolamine108-01-0890.56
Chemical nameCAS RNMWHI 2014 model
Plicatic acid16462-65-04220.85
Cefadroxil66592-87-83811
Ampicillin69-52-33490.99
Captafol2939-80-23490.1
Trimethylolpropane triacrylate15625-89-52960.49
Tetrachlorophthalic anhydride117-08-82861
Trimellitic anhydride552-30-71921
Penicillamine52-67-51490.76
Methyl-2-cyanoacrylate137-05-31111
Dimethylethanolamine108-01-0890.56
Table 3.

External validation asthmagens

Chemical nameCAS RNMWHI 2014 model
Plicatic acid16462-65-04220.85
Cefadroxil66592-87-83811
Ampicillin69-52-33490.99
Captafol2939-80-23490.1
Trimethylolpropane triacrylate15625-89-52960.49
Tetrachlorophthalic anhydride117-08-82861
Trimellitic anhydride552-30-71921
Penicillamine52-67-51490.76
Methyl-2-cyanoacrylate137-05-31111
Dimethylethanolamine108-01-0890.56
Chemical nameCAS RNMWHI 2014 model
Plicatic acid16462-65-04220.85
Cefadroxil66592-87-83811
Ampicillin69-52-33490.99
Captafol2939-80-23490.1
Trimethylolpropane triacrylate15625-89-52960.49
Tetrachlorophthalic anhydride117-08-82861
Trimellitic anhydride552-30-71921
Penicillamine52-67-51490.76
Methyl-2-cyanoacrylate137-05-31111
Dimethylethanolamine108-01-0890.56
Table 4.

External validation controls

Chemical nameCAS RNMWHI 2014 model
Zinc distearate557-05-16320.11
Dioxathion78-34-24570.00
Chlorinated camphene8001-35-24140.07
Heptachlor76-44-83730.11
Coumaphos56-72-43630.07
Methoxychlor72-43-53460.15
Benzo[a]pyrene50-32-82520.61
Picric acid88-89-12290.29
Monocrotofos6923-22-42230.16
Diethyl phthalate84-66-22220.11
Tetranitromethane509-14-81960.26
Dimethyl phthalate131-11-31940.11
Methyl ethyl ketone peroxide1338-23-41760.05
Cryofluorane76-14-21710.05
Benzyl acetate140-11-41500.07
Methyl iodide74-88-41420.05
Vinyl cyclohexene dioxide106-87-61400.05
m-Phthalodinitrile626-17-51280.05
Nonane111-84-21280.05
Diisopropylamine108-18-91010.18
Methyl isobutyl ketone108-10-11000.07
1-Nitropropane108-03-2890.07
Ethyl acetate141-78-6880.07
Tertiary-butyl-methyl-ether1634-04-4880.05
Chemical nameCAS RNMWHI 2014 model
Zinc distearate557-05-16320.11
Dioxathion78-34-24570.00
Chlorinated camphene8001-35-24140.07
Heptachlor76-44-83730.11
Coumaphos56-72-43630.07
Methoxychlor72-43-53460.15
Benzo[a]pyrene50-32-82520.61
Picric acid88-89-12290.29
Monocrotofos6923-22-42230.16
Diethyl phthalate84-66-22220.11
Tetranitromethane509-14-81960.26
Dimethyl phthalate131-11-31940.11
Methyl ethyl ketone peroxide1338-23-41760.05
Cryofluorane76-14-21710.05
Benzyl acetate140-11-41500.07
Methyl iodide74-88-41420.05
Vinyl cyclohexene dioxide106-87-61400.05
m-Phthalodinitrile626-17-51280.05
Nonane111-84-21280.05
Diisopropylamine108-18-91010.18
Methyl isobutyl ketone108-10-11000.07
1-Nitropropane108-03-2890.07
Ethyl acetate141-78-6880.07
Tertiary-butyl-methyl-ether1634-04-4880.05
Table 4.

External validation controls

Chemical nameCAS RNMWHI 2014 model
Zinc distearate557-05-16320.11
Dioxathion78-34-24570.00
Chlorinated camphene8001-35-24140.07
Heptachlor76-44-83730.11
Coumaphos56-72-43630.07
Methoxychlor72-43-53460.15
Benzo[a]pyrene50-32-82520.61
Picric acid88-89-12290.29
Monocrotofos6923-22-42230.16
Diethyl phthalate84-66-22220.11
Tetranitromethane509-14-81960.26
Dimethyl phthalate131-11-31940.11
Methyl ethyl ketone peroxide1338-23-41760.05
Cryofluorane76-14-21710.05
Benzyl acetate140-11-41500.07
Methyl iodide74-88-41420.05
Vinyl cyclohexene dioxide106-87-61400.05
m-Phthalodinitrile626-17-51280.05
Nonane111-84-21280.05
Diisopropylamine108-18-91010.18
Methyl isobutyl ketone108-10-11000.07
1-Nitropropane108-03-2890.07
Ethyl acetate141-78-6880.07
Tertiary-butyl-methyl-ether1634-04-4880.05
Chemical nameCAS RNMWHI 2014 model
Zinc distearate557-05-16320.11
Dioxathion78-34-24570.00
Chlorinated camphene8001-35-24140.07
Heptachlor76-44-83730.11
Coumaphos56-72-43630.07
Methoxychlor72-43-53460.15
Benzo[a]pyrene50-32-82520.61
Picric acid88-89-12290.29
Monocrotofos6923-22-42230.16
Diethyl phthalate84-66-22220.11
Tetranitromethane509-14-81960.26
Dimethyl phthalate131-11-31940.11
Methyl ethyl ketone peroxide1338-23-41760.05
Cryofluorane76-14-21710.05
Benzyl acetate140-11-41500.07
Methyl iodide74-88-41420.05
Vinyl cyclohexene dioxide106-87-61400.05
m-Phthalodinitrile626-17-51280.05
Nonane111-84-21280.05
Diisopropylamine108-18-91010.18
Methyl isobutyl ketone108-10-11000.07
1-Nitropropane108-03-2890.07
Ethyl acetate141-78-6880.07
Tertiary-butyl-methyl-ether1634-04-4880.05
HI distribution for external validation controls and asthmagens.
Figure 1.

HI distribution for external validation controls and asthmagens.

The ROC plot is shown in Figure 2b, adjacent to the ROC plot for the original QSAR (Figure 2a). The area under the ROC curve for the revised model was 0.95 (95% CI 0.87–1.0). Study of the ROC coordinates revealed that the optimal cut-point HI for use of the revised QSAR model as a general test for discriminating asthmagens from controls is 0.39, for which the sensitivity is 90% and specificity 96%.

ROC plots for asthma hazard prediction models developed by this group: (a) original 2005 model (b) 2014 revised model.
Figure 2.

ROC plots for asthma hazard prediction models developed by this group: (a) original 2005 model (b) 2014 revised model.

Discussion

The first iteration of an existing QSAR model developed for predicting chemical asthma hazard using updated learning datasets and computational techniques had a better predictive performance than the original model. The area under the ROC curve for this second version was found to be 0.95 which compares favourably with the corresponding value for the first model [12] which was 0.86 (Figure 2). The sensitivity 90% and specificity 96% using a cut-point HI of 0.39 suggest the new model can achieve comparable, or better, predictive performance than the original model (sensitivity: 79–86%, specificity: 93–99%, using cut-point HI 0.5) [6,12]. This improvement by iteration potentially offers a better tool for assisting clinicians in identifying novel chemical causes of OA and a QSAR with more reliable predictive performance that could be utilized in regulatory screening of chemicals for respiratory sensitization hazard.

Computer-based techniques for hazard prediction may be advantageous over other methods being researched for chemical respiratory sensitizer prediction [25] because they could offer a quick, cheap and efficient means of screening large numbers of industrial chemicals. The QSAR approach permits the maximum exploitation of available human data about OA to derive a predictive model and the overall strategy for development of this revised model was as described for the original version [6]. It was possible to utilize a larger learning dataset of asthmagenic chemicals for the revised model by also including newly reported asthmagenic compounds identified in the literature between January 1995 and December 2012. The development of the second model also differed from the first in that controls were selected to match, as far as possible from the exposure limit tables chosen as their source, the MW distribution of the asthmagenic set. It was not possible to do so perfectly because there was a paucity of control chemicals with MW 300Da or greater, even when lists of controls identified from the WEL tables published by both ACGIH and HSE were combined. Nevertheless in the revised model, MW was not found to be a significant determinant of whether or not a compound was an asthmagen.

Chlorine and fluorine atomic groups were removed in a relatively early stage of the backward stepwise method during development of the revised model, whereas they had featured in the final logistic regression equation of the original model in which they exerted a significant negative influence on a compound’s HI value. An ex ample that illustrates the superiority of the second version in this regard is the recently published case report of OA caused by exposure to tafenoquine in a pharmaceutical worker [26]. As was discussed in that case report, the chemical structure of tafenoquine contains three fluorine atoms which potentially explain why it is an example of a ‘false negative’ using the first Jarvis et al. QSAR. When tafenoquine was entered into the computer program for the revised version of this QSAR described here, it correctly predicted tafenoquine to be an asthmagen with HI 0.83.

The new software and its development can be seen on request (it is stored in a research data archive at The University of Edinburgh). The model has been developed for LMW (<1000Da) organic chemicals, defined as containing carbon, and any of the following hetero atoms only: hydrogen, oxygen, nitrogen, bromine, chlor ine, fluorine, iodine, sulphur, phosphorus and silicon. If the chemical is an organic salt, then a molfile of the organic ion should be entered as a separate entity. The exclusion of metallo-complex compounds will need to be addressed in the next iteration of the model. An anomaly that will also need to be addressed in future work is that the HI obtained for benzo[a]pyrene (an external validation control) was dependant on how the molfile table represented the same compound topologically, in terms of aromatic double bond positions. An alternative repre sentation of the double bonds in benzo[a]pyrene resulted in a much lower HI which, if that result had been included in the validation, would have increased the specificity of the model even further. Sources of error interpreting the literature data may occur at several layers: shortcomings in MDL Molfile format [27] for representing structure; algorithmic shortcomings in the software handling the structures and the statistical process. This emphasizes the importance of a validation step.

Comparison of the predictive performance of the second version of this QSAR with that of the available SAR models for respiratory sensitization is not straight forward because there is variation in the methods used in the published validation statistics for these models [28]. The OECD has produced guidelines for the validation of QSARs in regulatory toxicology [29] which include the statement that external validation is preferable. Both versions of this QSAR have been externally validated, the first version by waiting several years for the emergence of novel chemical asthmagens not used in the learning dataset and for the second version by prospectively excluding 10% of learning dataset compounds selected at random. No other published QSAR appears to have been extern ally validated [4,5]. Dik et al. [28] recently attempted to compare the predictive performances of five available SAR models by creating large validation sets comprising the learning dataset chemicals for each model, supplementing the ‘respiratory non-sensitizers’ with 168 chemicals that had a negative result using the mouse local lymph node assay (LLNA). Thus, the pooled ‘respiratory non-sensitizer’ validation set contained a heterogeneous group of chemicals comprising human respiratory non-sensitizers, human skin non-sensitizers and LLNA negative compounds. Control chemicals selected from human exposure data in the form of WEL tables are more likely to be representative of human respiratory non-sensitizers than selective human or mouse evidence of skin non-sensitizing potential. It is also relevant that not one of the 301 control compounds that were selected from the HSE 1994 EH40 tables for use in the learning dataset of the first Jarvis et al. QSAR model [6] has been reported in a peer-reviewed case report to have caused a case of OA in the subsequent 20 years.

A possible criticism of the QSAR technique is that, although it can lead to the generation of mechanistic hypotheses, it does not necessarily provide a mechanistic explanation for how a chemical which is predicted to be a respiratory sensitizer causes asthma. Qualitative input from experts in mechanistic chemistry can help to describe how a predicted sensitizer could potentially react with a human protein molecule resulting in subsequent immunopathological processes. We have previously suggested how this QSAR approach could be used in conjunction with such mechanistic analysis in predictive toxicology [10]. As we have only been able to validate this revised version with a small external validation set of compounds, further evaluations are required to confirm its improved predictive performance over the original QSAR model for asthma hazard prediction. A network of clinicians who are expert in the field of OA and in identifying new causal agents have already utilized the original QSAR model for this purpose [14–16]. Once this revised model has been made available on the internet for public use, a more formal evaluation for its use in a clinical context can be designed.

Key points
  • There is currently no widely accepted method for predicting the asthma hazard of workplace chemicals.

  • Computer-based quantitative structure–activity relationship models offer a quick, inexpensive, valid and efficient method for screening chemicals for respiratory sensitization potential.

  • This second version (i.e. first iteration) of an existing quantitative structure–activity relationship model developed by the authors has a better predictive performance than the original model.

Funding

The Colt Foundation (grant number CF/04/10 to M.J.S. for his work in developing the learning dataset compounds used to generate this revised QSAR model).

Conflicts of interest

None declared.

Acknowledgement

We thank Mr Matthew Gittins of the University of Manchester who helped with the production of the ROC curves illustrated in Figure 2.

References

1.

Quirce
S
Bernstein
JA
.
Old and new causes of occupational asthma
.
Immunol Allergy Clin North Am
2011
;
31
:
677
698, v
.

2.

Vandebriel
R
Callant Cransveld
C
Crommelin
D
et al.
Respiratory sensitization: advances in assessing the risk of respiratory inflammation and irritation
.
Toxicol In Vitro
2011
;
25
:
1251
1258
.

3.

Lalko
JF
Kimber
I
Dearman
RJ
Gerberick
GF
Sarlo
K
Api
AM
.
Chemical reactivity measurements: potential for characterization of respiratory chemical allergens
.
Toxicol In Vitro
2011
;
25
:
433
445
.

4.

Cunningham
AR
Cunningham
SL
Consoer
DM
Moss
ST
Karol
MH
.
Development of an information-intensive structure-activity relationship model and its application to human respiratory chemical sensitizers
.
SAR QSAR Environ Res
2005
;
16
:
273
285
.

5.

Graham
C
Rosenkranz
HS
Karol
MH
.
Structure-activity model of chemicals that cause human respiratory sensitization
.
Regul Toxicol Pharmacol
1997
;
26
:
296
306
.

6.

Jarvis
J
Seed
MJ
Elton
R
Sawyer
L
Agius
R
.
Relationship between chemical structure and the occupational asthma hazard of low molecular weight organic compounds
.
Occup Environ Med
2005
;
62
:
243
250
.

7.

Seed
MJ
Cullinan
P
Agius
RM
.
Methods for the prediction of low-molecular-weight occupational respiratory sensitizers
.
Curr Opin Allergy Clin Immunol
2008
;
8
:
103
109
.

8.

Enoch
SJ
Roberts
DW
Cronin
MT
.
Mechanistic category formation for the prediction of respiratory sensitization
.
Chem Res Toxicol
2010
;
23
:
1547
1555
.

9.

Enoch
SJ
Roberts
DW
Cronin
MT
.
Electrophilic reaction chemistry of low molecular weight respiratory sensitizers
.
Chem Res Toxicol
2009
;
22
:
1447
1453
.

10.

Enoch
SJ
Seed
MJ
Roberts
DW
Cronin
MT
Stocks
SJ
Agius
RM
.
Development of mechanism-based structural alerts for respiratory sensitization hazard identification
.
Chem Res Toxicol
2012
;
25
:
2490
2498
.

11.

Agius
RM
.
Why are some low-molecular-weight agents asthmagenic?
Occup Med State Art Rev
2000
;
15
:
369
384
.

12.

Seed
M
Agius
R
.
Further validation of computer-based prediction of chemical asthma hazard
.
Occup Med (Lond)
2010
;
60
:
115
120
.

13.

Centre for Occupational and Environmental Health, University of Manchester [Internet]
.
Occupational Asthma Hazard Resource
. http://www.coeh.man.ac.uk/asthma/login.php (2 July 2015, date last accessed).

14.

Pralong
JA
Seed
MJ
Cartier
A
Agius
RM
Labrecque
M
.
Is there a place for a computer based asthma hazard prediction model in clinical practice?
Occup Environ Med
2012
;
69
:
771
772
.

15.

Moore
VC
Manney
S
Vellore
AD
Burge
PS
.
Occupational asthma to gel flux containing dodecanedioic acid
.
Allergy
2009
;
64
:
1099
1100
.

16.

Anees
W
Moore
VC
Croft
JS
Robertson
AS
Burge
PS
.
Occupational asthma caused by heated triglycidyl isocyanurate
.
Occup Med (Lond)
2011
;
61
:
65
67
.

17.

Royal Society of Chemistry [Internet]
.
The Merck Index Online 2013
. http://www.rsc.org/Merck-Index/ (
2 July 2015
, date last accessed).

18.

Chemical Book [Internet]
. http://www.chemicalbook.com (
14 September 2013
, date last accessed).

19.

NCBI [Internet]
.
PubChem
. https://pubchem.ncbi.nlm.nih.gov/ (
2 July 2015
, date last accessed).

20.

The ChemExper Chemical Directory [Internet]
.
2013
. http://www.Chemexper.com (
2 July 2015
, date last accessed).

21.

Royal Society of Chemistry [Internet]
.
ChemSpider 2013
. www.chemspider.com/ (
2 July 2015
, date last accessed).

22.

HSE
.
Health & Safety Executive (HSE) EH40/2005 Workplace Exposure Limits
. 2nd edn.
Sudbury, UK
:
HSE Books
,
2011
.

23.

ACGIH (American Conference of Governmental Industrial Hygienists). 2012. TLVs and BEIs
. Publication #0112.
Cincinnati, OH
:
ACGIH
,
2012
.

24.

Altman
DG
Bland
JM
.
Diagnostic tests 3: receiver operating characteristic plots
.
BMJ
1994
;
309
:
188
.

25.

Roggen
EL
Blaauboer
BJ
.
Sens-it-iv: a European Union project to develop novel tools for the identification of skin and respiratory sensitizers
.
Toxicol In Vitro
2013
;
27
:
1121
.

26.

Cannon
J
Fitzgerald
B
Seed
M
Agius
R
Jiwany
A
Cullinan
P
.
Occupational asthma from tafenoquine in the pharmaceutical industry: implications for QSAR
.
Occup Med (Lond)
2015
;
65
:
256
258
.

27.

Clark
AM
.
Accurate specification of molecular structures: the case for zero-order bonds and explicit hydrogen counting
.
J Chem Inf Model
2011
;
51
:
3149
3157
.

28.

Dik
S
Ezendam
J
Cunningham
AR
Carrasquer
CA
van Loveren
H
Rorije
E
.
Evaluation of in silico models for the identification of respiratory sensitizers
.
Toxicol Sci
2014
;
142
:
385
394
.

29.

OECD
.
The Report From the Expert Group on (Quantitative) Structure-Activity Relationship ([Q]SARs) on the Principles for the Validation of (Q)SARs. OECD Series on Testing and Assessment No. 49. ENV/JM/MONO(2004)/24
.
Paris, France
:
OECD
,
2004
.