Login   Search

Molecular Modeling Pro™ Plus

Advanced Molecular Modeling to Examine QSAR and QSPR Relationships

Buy-Now

Molecular Modeling Pro Plus (MMP+) includes all of the features of Molecular Modeling Pro and adds the ability to study QSAR and QSPR relationships.

QSAR (quantitative structure activity relationships) and QSPR (quantitative structure property relationships) study relationships between useful molecular properties (such as the ability to control a human disease or lubricate a piston) and the underlying chemical and physical properties which may enhance or limit the desired property. Molecular Modeling Pro Plus (MMP+) has three main features that aid in the discovery of these relationships:

  • calculates molecular properties from structure
  • contains statistical and graphing tools for analyzing data
  • data base storage, retrieval and manipulation capacity

Unlike general statistical packages, MMP+ is specialized to deal with QSAR/QSPR problems, and accomplishes these tasks easily and efficiently, with tools not found elsewhere.

Examples of Program output:


Replacement of chloroform with safer solvents.

(an example of a special use of Molecular Modeling Pro Plus)

In this example, the user of the program wants to replace chloroform in a formulation with safer solvents. After choosing the replace component in a mixture menu item, the program asks him the following:

  • Which compound in the data base should be replaced
  • What properties should be used as the basis of selecting molecules with similar properties (in this example the 3-D solubility parameters, dispersion, polarity and hydrogen bonding were selected).
  • Replace with 1, 2 or 3 compounds?

Then the program does the calculations and prints out:

Best fits for Hansen dispersion = 17.8 and Hansen polarity = 3.1 and Hansen hydrogen bonding = 5.7 imitating chloroform

isoamyl acetate 90%; trifluoromethylbenzene 10%; difference = 6.99038505554185E-02 methyl hexanoate 60%; piperidine 40%; difference = 7.98199653625493E-02 n-butylamine 60%; 1-methylnaphthalene 40%; difference = 8.33061218261727E-02 n-nonane 40%; 1,1,2,2-tetrachloroethane 60%; difference = .108578395843507 methyl oleate 20%; phenethyl acetate 80%; difference = .111801671981811 formic acid 10%; Phenyl(CH2)3OMe 90%; difference = .112714648246764 ethyl heptanoate 30%; phenethyl acetate 70%; difference = .117056584358216 methyl octanoate 30%; phenethyl acetate 70%; difference = .135206198692322 n-octane 40%; 1,1,2,2-tetrachloroethane 60%; difference = .144738292694091 chlorodifluoromethane 10%; Phenyl(CH2)3OC(=O)Me 90%; difference = .1449111700058 diethyl carbonate 40%; methyl phenylacetate 60%; difference = .145699405670168 ethyl hydrocinnamate 80%; methylal 20%; difference = .149800872802734 Phenyl(CH2)3OMe 20%; phenylpropyl ether 80%; difference = .150111436843873 ethylphenyl ether 70%; Phenyl(CH2)3OMe 30%; difference = .157015228271484 ethyl hydrocinnamate 80%; octanoic acid 20%; difference = .157251262664794 n-decane 40%; 1,1,2,2-tetrachloroethane 60%; difference = .166656398773194 benzyl ether 60%; diethyl ether 40%; difference = .167092275619508 diethylene glycol butyl ether 20%; Phenyl(CH2)3OMe 80%; difference = .170285606384277 bromoform 30%; heptyl acetate 70%; difference = .172407412528992 ethyl propionate 20%; Phenyl(CH2)3OC(=O)Me 80%; difference = .173979091644289 hexanal 20%; Phenyl(CH2)3OC(=O)Me 80%; difference = .176705598831178 ethyl hydrocinnamate 80%; isononanoic acid 20%; difference = .180992984771727 Phenyl(CH2)3OC(=O)Me 80%; tripropylamine 20%; difference = .185638332366944 1,2-dichloroethane 40%; 1,2-dichloroethene (trans) 60%; difference = .18863229751587 PhenylC(COOEt)2H 80%; trichlorofluoromethane 20%; difference = .200468732576242 phenylpropyl ether 90%; toluene 10%; difference = .201505118608473 4-acetylbutyric acid 20%; Phenyl(CH2)3OMe 80%; difference = .203740787506104 methyl hexanoate 30%; Phenyl(CH2)3OC(=O)Me 70%; difference = .204795336723326 isopropyl palmitate 20%; phenethyl acetate 80%; difference = .207344675064087

Back to Top


Example of an X - Y Plot

Back to Top


Example of a 3-D Contour Plot

The boiling point example with cross product terms and level 3 polynomial model:

Boiling point = intercept + a*log Kow + b*MR + c*(LogKow*MR)

+ d*log Kow^2 + e* MR^2 + f*logKow^3 + g*MR^3

The program first gives the user all of the statistics of the underlying multiple regression model. Then it prints out the actual 3-D plot.

Back to Top


Brute force regression using PLS Content

Back to Top


Multiple linear regression output from Molecular Modeling Pro Plus

This page shows and explains the output from the Molecular Modeling Pro Plus multiple linear regression routine. The output contains several parts:

  • correlation matrix, eigenvalues and eigenvectors
  • analysis of variance table
  • the regression model (y = intercept + a*x(1) + b*x(2) + c*x(3)...)
  • table of observed, predicted and residual values for all compounds in the data base
  • plot of predicted versus observed values
  • plot of predicted versus residual values
  • PRESS analysis for all compounds in the data base and cross-validation
  • table of compounds sorted by highest values

Correlation matrix for regression variables:

Note: Log Kow and MR are highly correlated, which may cause problems with the analysis...


Boiling point (C) Log Kow MR
Boiling point (C) 1. 0.29434 0.61573
Log Kow 0.29434 1. 0.864
MR 0.61573 0.864 1.

The Eigenvalues found after 1 rotations are:

W 1 = 1.846697 ; W 2 = .1533031 ;

The proportion of variance of each component is:

W 1 = .9233485 ; W 2 = 7.665153E-02 ;

The corresponding eigenvectors are:

W 1 W 2

Log P .7071068 .7071068

MR .7071068 -.7071068

Analysis of variance

Variation source df Sum of squares Mean square Statistics
Total (uncorrected) 296 7652412
F = 187.32
Mean 1 5624701.82
r squared = 0.56114
Total (corrected) 295 2027709.71
s = 55.11
Regression 2 1137819.96 568909.98
Residual 293 889889.8 3037.17

Note: probability of significant F =<0.0001

The above table shows that 56% of the variance in solvent boiling points is explained by Log Kow (calculated) and molar refractivity (MR, calculated) (we can say this because r squared is 0.56114). The standard deviation of 55.11 degrees C may be larger than we can except.

The Model

Model coefficients and standard errors

parameter coefficient standard error t probability
intercept 28.53 6.51 4.38 0.0000162
Log Kow -27.99 2.54 11.02 6.82 E-24
MR 4.768 0.2679 17.80 1.21 E-48

note: response variable: Literature Boiling Point (C)

The model is: boiling point = 28.53 - 27.99*Log Kow + 4.768*MR (column 2). All variables easily meet the criterion of their probability of being due to chance being less than 0.05 (far right column). The standard error of MR is fairly low, and that of Log Kow being higher.

Printout of response values, predicted values and residuals:


observed predicted residual
acetal 102 160.71 -58.7105
acetaldehyde 21 97.6349 -76.6349
acetic acid 117 105.186 11.814
acetic anhydride 139 159.054 -20.0538

... and so on until all 300+ observations are printed out...

Examine the above table to find outliers (compounds poorly predicted and having large residuals). You may want to take them out and redo the analysis. You can also find outliers on the next table. In MAP you can click on a data point on the plot to learn its identity.

Plot of predicted versus observed:

Plot of predicted versus residuals:

The above table should be a perfect scatter plot (uncorrelated). However this plot is not perfect. The negative residuals have 2 clear groups. This leads one to suspect there may be a missing variable. Other things to look for in the plot of residuals are curves and funnel shapes. These effects indicate that the data should be transformed (e.g. take the reciprical, add a square term, a cross-product term, take the log etc.)

PRESS analysis and cross validation:

Contributions to PRESS (Predictive Residual Sum of Squares):

Compound Predictive discrepancy
acetal 3482.369
acetaldehyde 5949.886
acetic acid 141.3421
acetic anhydride 408.8725
acetol 156.2702

...and so on until all compounds have been listed. The compounds with larger number are more influential in determining the model coefficients...

Total PRESS = 929350.7

Sum of squares of response (SSY) = 2027709.70570218

Press/SSY = .458325311363807

The model has failed the cross-validation criterion (PRESS/SSY <0.4).

The model marginally failed the cross-validation test. The mediocre r squared of 0.56, the high standard deviation, the intercorrelated independent variables and the failed cross-validation test all are indicators that we can do better...

'
Compounds with highest predicted values:
triisononyl trimellitate ( 461.5005)
triisooctyl trimelliate ( 439.8288)
ditridecyl phthalate ( 389.5374)
triethylene glycol oleyl ether ( 376.9923)
dibutyl stearate ( 359.3761)
diisodecyl phthalate ( 353.4654)
diisodecyl phthalate ( 353.4654)
diisononyl adipate ( 352.1996)
diisononyl phthalate ( 339.0171)
diisooctyl phthalate ( 324.5693)
triacetin ( 318.6974)
dioctyl phthalate ( 317.2979)
diisoheptyl phthalate ( 310.1215)
dibutyl sebacate ( 308.8088)

Back to Top


Partial Least Squares Regression Output

The output consists of the following parts:

  • cross-validation
  • PLS components: variance explained for x and y
  • the actual model
  • the observed, predicted and residual values of all the compounds
  • more PLS statistics (column means, P vector)
  • optionally the user can create x-y plots of any of the PLS vectors, observed or predicted values

Cross Validation

Working with 1 PLS Components
CROSS VALIDATION RESULTS

Principle component number 1:

Partial PRESS for group 1 = 375374.6

Partial PRESS for group 2 = 275348.9

Partial PRESS for group 3 = 282589.7

Partial PRESS for group 4 = 413878

Sum of y variance before = 2027710

PRESS/SDBEF = .8151015

Column means were removed.

Number of Dependent Variables : 1
Number of Independent Variables: 2
Number of observations: 296

PLS component 1 :

X variance explained, this component: .996898760297766
accumulated: .996898760297766

Y variance explained, this component: .377643510112438
accumulated: .377643510112438

The Actual Model

PLS Model Regression Coefficients:

Y = Literature Boiling Point (C)
-- Intercept: 60.54151
-- Log P: .1137258
-- MR: 2.25315

Predicted Literature Boiling Point (C) for all compounds

Name(observed y, predicted y, residual)

triisononyl trimellitate( 311 , 456.4935 , -145.4935 , (F = 7.383684E-06))
ditridecyl phthalate( 286 , 428.0999 , -142.0999 , (F = 4.379508E-04))
triisooctyl trimelliate( 300 , 425.0979 , -125.0979 , (F = 6.390917E-05))
diisodecyl phthalate( 250 , 365.2788 , -115.2788 , (F = 4.110049E-05))
diisodecyl phthalate( 256 , 365.2788 , -109.2788 , (F = 4.110049E-05))
dibutyl stearate(?, 352.4491 , ?)
diisononyl phthalate( 245 , 344.3482 , -99.34821 , (F = 8.36994E-06))
diisononyl adipate( 233 , 331.4888 , -98.48884 , (F = 1.263267E-04))
triethylene glycol oleyl ether(?, 329.5998 , ?)
dioctyl phthalate(?, 323.4476 , ?)
diisooctyl phthalate( 230 , 323.4178 , -93.41779 , (F = 3.905712E-07))

... and so on for all the compounds in the data base...

--COLUMN MEANS--
Y variables:
Literature Boiling Point (C): 137.8491

X variables:
Log P: 1.92405
MR: 34.21377

Y variance for Literature Boiling Point (C): 25940.37
X variance for Log P: 9.363692
X variance for MR: 1681.273

The P vector is the x variable loadings
--P VECTOR COMPONENT 1 --
Log P: 8.9


To purchase, click on the Buy Now button above.  For questions, please contact us at 707-864-0845 or info@chemsw.com.

© 2013 ChemSW