Advanced Molecular Modeling to Examine QSAR and QSPR Relationships
Molecular Modeling Pro Plus (MMP+) includes all of the features of Molecular Modeling Pro and adds the ability to study QSAR and QSPR relationships.
QSAR (quantitative structure activity relationships) and QSPR (quantitative structure property relationships) study relationships between useful molecular properties (such as the ability to control a human disease or lubricate a piston) and the underlying chemical and physical properties which may enhance or limit the desired property. Molecular Modeling Pro Plus (MMP+) has three main features that aid in the discovery of these relationships:
 calculates molecular properties from structure
 contains statistical and graphing tools for analyzing data
 data base storage, retrieval and manipulation capacity
Unlike general statistical packages, MMP+ is specialized to deal with QSAR/QSPR problems, and accomplishes these tasks easily and efficiently, with tools not found elsewhere.
Examples of Program output:
(an example of a special use of Molecular Modeling Pro Plus)
In this example, the user of the program wants to replace chloroform in a formulation with safer solvents. After choosing the replace component in a mixture menu item, the program asks him the following:
 Which compound in the data base should be replaced
 What properties should be used as the basis of selecting molecules with similar properties (in this example the 3D solubility parameters, dispersion, polarity and hydrogen bonding were selected).
 Replace with 1, 2 or 3 compounds?
Then the program does the calculations and prints out:
Best fits for Hansen dispersion = 17.8 and Hansen polarity = 3.1
and Hansen hydrogen bonding = 5.7 imitating chloroform
isoamyl acetate 90%; trifluoromethylbenzene 10%; difference = 6.99038505554185E02
methyl hexanoate 60%; piperidine 40%; difference = 7.98199653625493E02
nbutylamine 60%; 1methylnaphthalene 40%; difference = 8.33061218261727E02
nnonane 40%; 1,1,2,2tetrachloroethane 60%; difference = .108578395843507
methyl oleate 20%; phenethyl acetate 80%; difference = .111801671981811
formic acid 10%; Phenyl(CH2)3OMe 90%; difference = .112714648246764
ethyl heptanoate 30%; phenethyl acetate 70%; difference = .117056584358216
methyl octanoate 30%; phenethyl acetate 70%; difference = .135206198692322
noctane 40%; 1,1,2,2tetrachloroethane 60%; difference = .144738292694091
chlorodifluoromethane 10%; Phenyl(CH2)3OC(=O)Me 90%; difference = .1449111700058
diethyl carbonate 40%; methyl phenylacetate 60%; difference = .145699405670168
ethyl hydrocinnamate 80%; methylal 20%; difference = .149800872802734
Phenyl(CH2)3OMe 20%; phenylpropyl ether 80%; difference = .150111436843873
ethylphenyl ether 70%; Phenyl(CH2)3OMe 30%; difference = .157015228271484
ethyl hydrocinnamate 80%; octanoic acid 20%; difference = .157251262664794
ndecane 40%; 1,1,2,2tetrachloroethane 60%; difference = .166656398773194
benzyl ether 60%; diethyl ether 40%; difference = .167092275619508
diethylene glycol butyl ether 20%; Phenyl(CH2)3OMe 80%; difference = .170285606384277
bromoform 30%; heptyl acetate 70%; difference = .172407412528992
ethyl propionate 20%; Phenyl(CH2)3OC(=O)Me 80%; difference = .173979091644289
hexanal 20%; Phenyl(CH2)3OC(=O)Me 80%; difference = .176705598831178
ethyl hydrocinnamate 80%; isononanoic acid 20%; difference = .180992984771727
Phenyl(CH2)3OC(=O)Me 80%; tripropylamine 20%; difference = .185638332366944
1,2dichloroethane 40%; 1,2dichloroethene (trans) 60%; difference = .18863229751587
PhenylC(COOEt)2H 80%; trichlorofluoromethane 20%; difference = .200468732576242
phenylpropyl ether 90%; toluene 10%; difference = .201505118608473
4acetylbutyric acid 20%; Phenyl(CH2)3OMe 80%; difference = .203740787506104
methyl hexanoate 30%; Phenyl(CH2)3OC(=O)Me 70%; difference = .204795336723326
isopropyl palmitate 20%; phenethyl acetate 80%; difference = .207344675064087
Back to Top
Back to Top
The boiling point example with cross product terms and level 3 polynomial model:
Boiling point = intercept + a*log Kow + b*MR + c*(LogKow*MR)
+ d*log Kow^2 + e* MR^2 + f*logKow^3 + g*MR^3
The program first gives the user all of the statistics of the underlying multiple regression model. Then it prints out the actual 3D plot.
Back to Top
Back to Top
This page shows and explains the output from the Molecular Modeling Pro Plus multiple linear regression routine. The output contains several parts:
 correlation matrix, eigenvalues and eigenvectors
 analysis of variance table
 the regression model (y = intercept + a*x(1) + b*x(2) + c*x(3)...)
 table of observed, predicted and residual values for all compounds in the data base
 plot of predicted versus observed values
 plot of predicted versus residual values
 PRESS analysis for all compounds in the data base and crossvalidation
 table of compounds sorted by highest values
Correlation matrix for regression variables:
Note: Log Kow and MR are highly correlated, which may cause problems with the analysis...

Boiling point (C) 
Log Kow 
MR 
Boiling point (C) 
1. 
0.29434 
0.61573 
Log Kow 
0.29434 
1. 
0.864 
MR 
0.61573 
0.864 
1. 
The Eigenvalues found after 1 rotations are:
W 1 = 1.846697 ; W 2 = .1533031 ;
The proportion of variance of each component is:
W 1 = .9233485 ; W 2 = 7.665153E02 ;
The corresponding eigenvectors are:
W 1 W 2
Log P .7071068 .7071068
MR .7071068 .7071068
Analysis of variance
Variation source 
df 
Sum of squares 
Mean square 
Statistics 
Total (uncorrected) 
296 
7652412 

F = 187.32 
Mean 
1 
5624701.82 

r squared = 0.56114 
Total (corrected) 
295 
2027709.71 

s = 55.11 
Regression 
2 
1137819.96 
568909.98 

Residual 
293 
889889.8 
3037.17 

Note: probability of significant F =<0.0001
The above table shows that 56% of the variance in solvent boiling points is explained by Log Kow (calculated) and molar refractivity (MR, calculated) (we can say this because r squared is 0.56114). The standard deviation of 55.11 degrees C may be larger than we can except.
The Model
Model coefficients and standard errors
parameter 
coefficient 
standard error 
t 
probability 
intercept 
28.53 
6.51 
4.38 
0.0000162 
Log Kow 
27.99 
2.54 
11.02 
6.82 E24 
MR 
4.768 
0.2679 
17.80 
1.21 E48 
note: response variable: Literature Boiling Point (C)
The model is: boiling point = 28.53  27.99*Log Kow + 4.768*MR (column 2). All variables easily meet the criterion of their probability of being due to chance being less than 0.05 (far right column). The standard error of MR is fairly low, and that of Log Kow being higher.
Printout of response values, predicted values and residuals:

observed 
predicted 
residual 
acetal 
102 
160.71 
58.7105 
acetaldehyde 
21 
97.6349 
76.6349 
acetic acid 
117 
105.186 
11.814 
acetic anhydride 
139 
159.054 
20.0538 
... and so on until all 300+ observations are printed out...
Examine the above table to find outliers (compounds poorly predicted and having large residuals). You may want to take them out and redo the analysis. You can also find outliers on the next table. In MAP you can click on a data point on the plot to learn its identity.
Plot of predicted versus observed:
Plot of predicted versus residuals:
The above table should be a perfect scatter plot (uncorrelated). However this plot is not perfect. The negative residuals have 2 clear groups. This leads one to suspect there may be a missing variable. Other things to look for in the plot of residuals are curves and funnel shapes. These effects indicate that the data should be transformed (e.g. take the reciprical, add a square term, a crossproduct term, take the log etc.)
PRESS analysis and cross validation:
Contributions to PRESS (Predictive Residual Sum of Squares):
Compound Predictive discrepancy 
acetal 
3482.369 
acetaldehyde 
5949.886 
acetic acid 
141.3421 
acetic anhydride 
408.8725 
acetol 
156.2702 
...and so on until all compounds have been listed. The compounds with larger number are more influential in determining the model coefficients...
Total PRESS = 929350.7
Sum of squares of response (SSY) = 2027709.70570218
Press/SSY = .458325311363807
The model has failed the crossvalidation criterion (PRESS/SSY <0.4).
The model marginally failed the crossvalidation test. The mediocre r squared of 0.56, the high standard deviation, the intercorrelated independent variables and the failed crossvalidation test all are indicators that we can do better...
'
Compounds with highest predicted values: 
triisononyl trimellitate 
( 461.5005) 
triisooctyl trimelliate 
( 439.8288) 
ditridecyl phthalate 
( 389.5374) 
triethylene glycol oleyl ether 
( 376.9923) 
dibutyl stearate 
( 359.3761) 
diisodecyl phthalate 
( 353.4654) 
diisodecyl phthalate 
( 353.4654) 
diisononyl adipate 
( 352.1996) 
diisononyl phthalate 
( 339.0171) 
diisooctyl phthalate 
( 324.5693) 
triacetin 
( 318.6974) 
dioctyl phthalate 
( 317.2979) 
diisoheptyl phthalate 
( 310.1215) 
dibutyl sebacate 
( 308.8088) 
Back to Top
The output consists of the following parts:
 crossvalidation
 PLS components: variance explained for x and y
 the actual model
 the observed, predicted and residual values of all the compounds
 more PLS statistics (column means, P vector)
 optionally the user can create xy plots of any of the PLS vectors, observed or predicted values
Cross Validation
Working with 1 PLS Components
CROSS VALIDATION RESULTS
Principle component number 1:
Partial PRESS for group 1 = 375374.6
Partial PRESS for group 2 = 275348.9
Partial PRESS for group 3 = 282589.7
Partial PRESS for group 4 = 413878
Sum of y variance before = 2027710
PRESS/SDBEF = .8151015
Column means were removed.
Number of Dependent Variables : 1
Number of Independent Variables: 2
Number of observations: 296
PLS component 1 :
X variance explained, this component: .996898760297766
accumulated: .996898760297766
Y variance explained, this component: .377643510112438
accumulated: .377643510112438
The Actual Model
PLS Model Regression Coefficients:
Y = Literature Boiling Point (C)
 Intercept: 60.54151
 Log P: .1137258
 MR: 2.25315
Predicted Literature Boiling Point (C) for all compounds
Name(observed y, predicted y, residual)
triisononyl trimellitate( 311 , 456.4935 , 145.4935 , (F = 7.383684E06))
ditridecyl phthalate( 286 , 428.0999 , 142.0999 , (F = 4.379508E04))
triisooctyl trimelliate( 300 , 425.0979 , 125.0979 , (F = 6.390917E05))
diisodecyl phthalate( 250 , 365.2788 , 115.2788 , (F = 4.110049E05))
diisodecyl phthalate( 256 , 365.2788 , 109.2788 , (F = 4.110049E05))
dibutyl stearate(?, 352.4491 , ?)
diisononyl phthalate( 245 , 344.3482 , 99.34821 , (F = 8.36994E06))
diisononyl adipate( 233 , 331.4888 , 98.48884 , (F = 1.263267E04))
triethylene glycol oleyl ether(?, 329.5998 , ?)
dioctyl phthalate(?, 323.4476 , ?)
diisooctyl phthalate( 230 , 323.4178 , 93.41779 , (F = 3.905712E07))
... and so on for all the compounds in the data base...
COLUMN MEANS
Y variables:
Literature Boiling Point (C): 137.8491
X variables:
Log P: 1.92405
MR: 34.21377
Y variance for Literature Boiling Point (C): 25940.37
X variance for Log P: 9.363692
X variance for MR: 1681.273
The P vector is the x variable loadings
P VECTOR COMPONENT 1 
Log P: 8.9
To purchase, click on the Buy Now button above. For questions, please contact us at 7078640845 or info@chemsw.com.