Advanced Molecular Modeling to Examine QSAR and QSPR Relationships
Molecular Modeling Pro Plus (MMP+) includes all of the features of Molecular Modeling Pro and adds the ability to study QSAR and QSPR relationships.
QSAR (quantitative structure activity relationships) and QSPR (quantitative structure property relationships) study relationships between useful molecular properties (such as the ability to control a human disease or lubricate a piston) and the underlying chemical and physical properties which may enhance or limit the desired property. Molecular Modeling Pro Plus (MMP+) has three main features that aid in the discovery of these relationships:
- calculates molecular properties from structure
- contains statistical and graphing tools for analyzing data
- data base storage, retrieval and manipulation capacity
Unlike general statistical packages, MMP+ is specialized to deal with QSAR/QSPR problems, and accomplishes these tasks easily and efficiently, with tools not found elsewhere.
Examples of Program output:
(an example of a special use of Molecular Modeling Pro Plus)
In this example, the user of the program wants to replace chloroform in a formulation with safer solvents. After choosing the replace component in a mixture menu item, the program asks him the following:
- Which compound in the data base should be replaced
- What properties should be used as the basis of selecting molecules with similar properties (in this example the 3-D solubility parameters, dispersion, polarity and hydrogen bonding were selected).
- Replace with 1, 2 or 3 compounds?
Then the program does the calculations and prints out:
Best fits for Hansen dispersion = 17.8 and Hansen polarity = 3.1
and Hansen hydrogen bonding = 5.7 imitating chloroform
isoamyl acetate 90%; trifluoromethylbenzene 10%; difference = 6.99038505554185E-02
methyl hexanoate 60%; piperidine 40%; difference = 7.98199653625493E-02
n-butylamine 60%; 1-methylnaphthalene 40%; difference = 8.33061218261727E-02
n-nonane 40%; 1,1,2,2-tetrachloroethane 60%; difference = .108578395843507
methyl oleate 20%; phenethyl acetate 80%; difference = .111801671981811
formic acid 10%; Phenyl(CH2)3OMe 90%; difference = .112714648246764
ethyl heptanoate 30%; phenethyl acetate 70%; difference = .117056584358216
methyl octanoate 30%; phenethyl acetate 70%; difference = .135206198692322
n-octane 40%; 1,1,2,2-tetrachloroethane 60%; difference = .144738292694091
chlorodifluoromethane 10%; Phenyl(CH2)3OC(=O)Me 90%; difference = .1449111700058
diethyl carbonate 40%; methyl phenylacetate 60%; difference = .145699405670168
ethyl hydrocinnamate 80%; methylal 20%; difference = .149800872802734
Phenyl(CH2)3OMe 20%; phenylpropyl ether 80%; difference = .150111436843873
ethylphenyl ether 70%; Phenyl(CH2)3OMe 30%; difference = .157015228271484
ethyl hydrocinnamate 80%; octanoic acid 20%; difference = .157251262664794
n-decane 40%; 1,1,2,2-tetrachloroethane 60%; difference = .166656398773194
benzyl ether 60%; diethyl ether 40%; difference = .167092275619508
diethylene glycol butyl ether 20%; Phenyl(CH2)3OMe 80%; difference = .170285606384277
bromoform 30%; heptyl acetate 70%; difference = .172407412528992
ethyl propionate 20%; Phenyl(CH2)3OC(=O)Me 80%; difference = .173979091644289
hexanal 20%; Phenyl(CH2)3OC(=O)Me 80%; difference = .176705598831178
ethyl hydrocinnamate 80%; isononanoic acid 20%; difference = .180992984771727
Phenyl(CH2)3OC(=O)Me 80%; tripropylamine 20%; difference = .185638332366944
1,2-dichloroethane 40%; 1,2-dichloroethene (trans) 60%; difference = .18863229751587
PhenylC(COOEt)2H 80%; trichlorofluoromethane 20%; difference = .200468732576242
phenylpropyl ether 90%; toluene 10%; difference = .201505118608473
4-acetylbutyric acid 20%; Phenyl(CH2)3OMe 80%; difference = .203740787506104
methyl hexanoate 30%; Phenyl(CH2)3OC(=O)Me 70%; difference = .204795336723326
isopropyl palmitate 20%; phenethyl acetate 80%; difference = .207344675064087
Back to Top

Back to Top
The boiling point example with cross product terms and level 3 polynomial model:
Boiling point = intercept + a*log Kow + b*MR + c*(LogKow*MR)
+ d*log Kow^2 + e* MR^2 + f*logKow^3 + g*MR^3
The program first gives the user all of the statistics of the underlying multiple regression model. Then it prints out the actual 3-D plot.

Back to Top

Back to Top
This page shows and explains the output from the Molecular Modeling Pro Plus multiple linear regression routine. The output contains several parts:
- correlation matrix, eigenvalues and eigenvectors
- analysis of variance table
- the regression model (y = intercept + a*x(1) + b*x(2) + c*x(3)...)
- table of observed, predicted and residual values for all compounds in the data base
- plot of predicted versus observed values
- plot of predicted versus residual values
- PRESS analysis for all compounds in the data base and cross-validation
- table of compounds sorted by highest values
Correlation matrix for regression variables:
Note: Log Kow and MR are highly correlated, which may cause problems with the analysis...
|
Boiling point (C) |
Log Kow |
MR |
| Boiling point (C) |
1. |
0.29434 |
0.61573 |
| Log Kow |
0.29434 |
1. |
0.864 |
| MR |
0.61573 |
0.864 |
1. |
The Eigenvalues found after 1 rotations are:
W 1 = 1.846697 ; W 2 = .1533031 ;
The proportion of variance of each component is:
W 1 = .9233485 ; W 2 = 7.665153E-02 ;
The corresponding eigenvectors are:
W 1 W 2
Log P .7071068 .7071068
MR .7071068 -.7071068
Analysis of variance
| Variation source |
df |
Sum of squares |
Mean square |
Statistics |
| Total (uncorrected) |
296 |
7652412 |
|
F = 187.32 |
| Mean |
1 |
5624701.82 |
|
r squared = 0.56114 |
| Total (corrected) |
295 |
2027709.71 |
|
s = 55.11 |
| Regression |
2 |
1137819.96 |
568909.98 |
|
| Residual |
293 |
889889.8 |
3037.17 |
|
Note: probability of significant F =<0.0001
The above table shows that 56% of the variance in solvent boiling points is explained by Log Kow (calculated) and molar refractivity (MR, calculated) (we can say this because r squared is 0.56114). The standard deviation of 55.11 degrees C may be larger than we can except.
The Model
Model coefficients and standard errors
| parameter |
coefficient |
standard error |
t |
probability |
| intercept |
28.53 |
6.51 |
4.38 |
0.0000162 |
| Log Kow |
-27.99 |
2.54 |
11.02 |
6.82 E-24 |
| MR |
4.768 |
0.2679 |
17.80 |
1.21 E-48 |
note: response variable: Literature Boiling Point (C)
The model is: boiling point = 28.53 - 27.99*Log Kow + 4.768*MR (column 2). All variables easily meet the criterion of their probability of being due to chance being less than 0.05 (far right column). The standard error of MR is fairly low, and that of Log Kow being higher.
Printout of response values, predicted values and residuals:
|
observed |
predicted |
residual |
| acetal |
102 |
160.71 |
-58.7105 |
| acetaldehyde |
21 |
97.6349 |
-76.6349 |
| acetic acid |
117 |
105.186 |
11.814 |
| acetic anhydride |
139 |
159.054 |
-20.0538 |
... and so on until all 300+ observations are printed out...
Examine the above table to find outliers (compounds poorly predicted and having large residuals). You may want to take them out and redo the analysis. You can also find outliers on the next table. In MAP you can click on a data point on the plot to learn its identity.
Plot of predicted versus observed:

Plot of predicted versus residuals:

The above table should be a perfect scatter plot (uncorrelated). However this plot is not perfect. The negative residuals have 2 clear groups. This leads one to suspect there may be a missing variable. Other things to look for in the plot of residuals are curves and funnel shapes. These effects indicate that the data should be transformed (e.g. take the reciprical, add a square term, a cross-product term, take the log etc.)
PRESS analysis and cross validation:
Contributions to PRESS (Predictive Residual Sum of Squares):
| Compound Predictive discrepancy |
| acetal |
3482.369 |
| acetaldehyde |
5949.886 |
| acetic acid |
141.3421 |
| acetic anhydride |
408.8725 |
| acetol |
156.2702 |
...and so on until all compounds have been listed. The compounds with larger number are more influential in determining the model coefficients...
Total PRESS = 929350.7
Sum of squares of response (SSY) = 2027709.70570218
Press/SSY = .458325311363807
The model has failed the cross-validation criterion (PRESS/SSY <0.4).
The model marginally failed the cross-validation test. The mediocre r squared of 0.56, the high standard deviation, the intercorrelated independent variables and the failed cross-validation test all are indicators that we can do better...
'
| Compounds with highest predicted values: |
| triisononyl trimellitate |
( 461.5005) |
| triisooctyl trimelliate |
( 439.8288) |
| ditridecyl phthalate |
( 389.5374) |
| triethylene glycol oleyl ether |
( 376.9923) |
| dibutyl stearate |
( 359.3761) |
| diisodecyl phthalate |
( 353.4654) |
| diisodecyl phthalate |
( 353.4654) |
| diisononyl adipate |
( 352.1996) |
| diisononyl phthalate |
( 339.0171) |
| diisooctyl phthalate |
( 324.5693) |
| triacetin |
( 318.6974) |
| dioctyl phthalate |
( 317.2979) |
| diisoheptyl phthalate |
( 310.1215) |
| dibutyl sebacate |
( 308.8088) |
Back to Top
The output consists of the following parts:
- cross-validation
- PLS components: variance explained for x and y
- the actual model
- the observed, predicted and residual values of all the compounds
- more PLS statistics (column means, P vector)
- optionally the user can create x-y plots of any of the PLS vectors, observed or predicted values
Cross Validation
Working with 1 PLS Components
CROSS VALIDATION RESULTS
Principle component number 1:
Partial PRESS for group 1 = 375374.6
Partial PRESS for group 2 = 275348.9
Partial PRESS for group 3 = 282589.7
Partial PRESS for group 4 = 413878
Sum of y variance before = 2027710
PRESS/SDBEF = .8151015
Column means were removed.
Number of Dependent Variables : 1
Number of Independent Variables: 2
Number of observations: 296
PLS component 1 :
X variance explained, this component: .996898760297766
accumulated: .996898760297766
Y variance explained, this component: .377643510112438
accumulated: .377643510112438
The Actual Model
PLS Model Regression Coefficients:
Y = Literature Boiling Point (C)
-- Intercept: 60.54151
-- Log P: .1137258
-- MR: 2.25315
Predicted Literature Boiling Point (C) for all compounds
Name(observed y, predicted y, residual)
triisononyl trimellitate( 311 , 456.4935 , -145.4935 , (F = 7.383684E-06))
ditridecyl phthalate( 286 , 428.0999 , -142.0999 , (F = 4.379508E-04))
triisooctyl trimelliate( 300 , 425.0979 , -125.0979 , (F = 6.390917E-05))
diisodecyl phthalate( 250 , 365.2788 , -115.2788 , (F = 4.110049E-05))
diisodecyl phthalate( 256 , 365.2788 , -109.2788 , (F = 4.110049E-05))
dibutyl stearate(?, 352.4491 , ?)
diisononyl phthalate( 245 , 344.3482 , -99.34821 , (F = 8.36994E-06))
diisononyl adipate( 233 , 331.4888 , -98.48884 , (F = 1.263267E-04))
triethylene glycol oleyl ether(?, 329.5998 , ?)
dioctyl phthalate(?, 323.4476 , ?)
diisooctyl phthalate( 230 , 323.4178 , -93.41779 , (F = 3.905712E-07))
... and so on for all the compounds in the data base...
--COLUMN MEANS--
Y variables:
Literature Boiling Point (C): 137.8491
X variables:
Log P: 1.92405
MR: 34.21377
Y variance for Literature Boiling Point (C): 25940.37
X variance for Log P: 9.363692
X variance for MR: 1681.273
The P vector is the x variable loadings
--P VECTOR COMPONENT 1 --
Log P: 8.9
To purchase, click on the Buy Now button above. For questions, please contact us at 707-864-0845 or info@chemsw.com.