This report was completed to analyze the cross-secitonal time series dataset AMC1 Modified.xlsx.
1 Data
| EtACAC (equiv) | Lewis Acid (equiv) | Solvent (vol) | Temp. (℃) | Time (h) | BQO (% IPC) | AMC1 (% IPC) | Dimer (% IPC) | HQN Imp | Key Imp | |
|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
Descriptive Statistics
| EtACAC (equiv) | Lewis Acid (equiv) | Solvent (vol) | Temp. (℃) | Time (h) | BQO (% IPC) | AMC1 (% IPC) | Dimer (% IPC) | HQN Imp | Key Imp | |
|---|---|---|---|---|---|---|---|---|---|---|
| count | 17.000000 | 17.000000 | 17.000000 | 17.000000 | 17.000000 | 17.000000 | 17.000000 | 17.000000 | 17.000000 | 17.000000 |
| mean | 1.011765 | 0.700000 | 20.411765 | 100.588235 | 6.705882 | 0.661765 | 46.412353 | 16.786471 | 4.835294 | 8.758235 |
| std | 0.239485 | 0.203101 | 14.495943 | 16.093705 | 3.653161 | 1.047707 | 10.824634 | 12.469459 | 2.620609 | 6.162751 |
| min | 0.800000 | 0.400000 | 8.000000 | 85.000000 | 3.000000 | 0.000000 | 20.420000 | 2.320000 | 2.240000 | 0.000000 |
| 25% | 0.800000 | 0.500000 | 10.000000 | 85.000000 | 4.000000 | 0.100000 | 39.930000 | 4.860000 | 3.020000 | 4.960000 |
| 50% | 1.000000 | 0.800000 | 10.000000 | 95.000000 | 6.000000 | 0.540000 | 44.300000 | 16.750000 | 4.360000 | 7.630000 |
| 75% | 1.200000 | 0.800000 | 30.000000 | 115.000000 | 12.000000 | 0.730000 | 55.020000 | 24.700000 | 4.990000 | 14.380000 |
| max | 1.600000 | 1.100000 | 50.000000 | 120.000000 | 12.000000 | 4.530000 | 60.680000 | 38.950000 | 10.920000 | 19.520000 |
Input Variable Configuration
For reference, the following Python dictionary describes the input and output conditions indicated by the client for the optimization.
```
{
"EtACAC (equiv)": {
"type": "float",
"lim": [
0.5,
3.0
],
"sigfigs": 1
},
"Lewis Acid (equiv)": {
"type": "float",
"lim": [
0.4,
2.0
],
"sigfigs": 1
},
"Solvent (vol)": {
"type": "int",
"lim": [
5,
50
]
},
"Temp. (\u2103)": {
"type": "int",
"lim": [
25,
120
]
},
"Time (h)": {
"type": "time",
"lim": [
3,
4,
6,
12
],
"int_mapping": {
"3": 0,
"4": 1,
"6": 2,
"12": 3
},
"doe_int_lim": [
0,
3
]
}
}
```
Output Variable Configuration
```
{
"BQO (% IPC)": {
"optimization_target": "min",
"optimization_important": true,
"y_bounds": [
0,
null
],
"y_constraint": null,
"weight": 0.5
},
"AMC1 (% IPC)": {
"optimization_target": "max",
"optimization_important": true,
"y_bounds": [
0,
null
],
"y_constraint": null,
"weight": 0.9
},
"Dimer (% IPC)": {
"optimization_target": "min",
"optimization_important": true,
"y_bounds": [
0,
null
],
"y_constraint": null,
"weight": 0.5
},
"HQN Imp": {
"optimization_target": "min",
"optimization_important": true,
"y_bounds": [
0,
null
],
"y_constraint": null,
"weight": 0.5
},
"Key Imp": {
"optimization_target": "min",
"optimization_important": true,
"y_bounds": [
0,
null
],
"y_constraint": null,
"weight": 0.5
}
}
```
Variable Distribution
Variable Association
WARNING: EtACAC (equiv) and Solvent (vol) have a correlation of 0.91
2 Variable Effects
Variable Importance
Importance Across Predictions
This beeswarm plot visualizes the feature importance from the machine learning model. Each point on the plot corresponds to a prediction made by the model. The position on the x-axis shows the impact of the feature on the model’s prediction, while the color of the point represents the value of the feature.
Importance is in descending order.
Parallel Coordinates
A parallel coordinates plot is a visualization that displays multivariate data by representing each data point as a line connecting parallel axes. On a vertical axis, each variable is represented by a horizontal line or axis. The value of the variable is plotted as a point on the axis. This allows for the comparison of multiple variables at once.
Interaction
The interaction heatmaps visualizes potential interaction effects between inputs. Higher values indicate the possibility of interaction.The interaction values are relative, and they are not definitive indicators of interaction. Check the partial dependence plots for more information.
3 Model
| Model Type | |
|---|---|
| BQO (% IPC) | Gradient Boosted Trees |
| AMC1 (% IPC) | Gradient Boosted Trees |
| Dimer (% IPC) | Gradient Boosted Trees |
| HQN Imp | Gradient Boosted Trees |
| Key Imp | Gradient Boosted Trees |
Observed vs. Predicted
The observed vs. predicted plot shows the model’s predictions against the actual values. The closer the points are to the diagonal line, the better the model’s predictions are at predicting the training set.
Error Summary
| Mean | StdDev | ||
|---|---|---|---|
| BQO (% IPC) | mae | 0.444510 | 0.069216 |
| medae | 0.261022 | 0.064624 | |
| mse | 0.667227 | 0.360903 | |
| rmse | 0.783348 | 0.232081 | |
| rsquared | 0.354163 | 0.349333 | |
| AMC1 (% IPC) | mae | 3.768082 | 0.659122 |
| medae | 2.272002 | 0.908754 | |
| mse | 36.542757 | 12.463797 | |
| rmse | 5.970611 | 0.948186 | |
| rsquared | 0.668637 | 0.113019 | |
| Dimer (% IPC) | mae | 4.384472 | 1.215224 |
| medae | 2.216677 | 0.871223 | |
| mse | 55.313808 | 24.643873 | |
| rmse | 7.283778 | 1.507229 | |
| rsquared | 0.622021 | 0.168400 | |
| HQN Imp | mae | 1.161839 | 0.109560 |
| medae | 0.473629 | 0.234369 | |
| mse | 3.630523 | 0.855918 | |
| rmse | 1.893809 | 0.210314 | |
| rsquared | 0.438314 | 0.132421 | |
| Key Imp | mae | 3.007029 | 0.663680 |
| medae | 1.290539 | 0.629583 | |
| mse | 27.521770 | 10.643890 | |
| rmse | 5.158392 | 0.957785 | |
| rsquared | 0.230062 | 0.297769 |
Bootstrapped Error Distributions
The table above displays histograms of bootstrapped error metrics, providing a visual representation of the distribution of performance. Examine the shape of each histogram to understand the central tendency and dispersion of the error metric. A symmetrical and bell-shaped histogram suggests a stable and consistent model performance, while skewed or multi-modal distributions may indicate variability or outliers.
4 Optimal Points
| EtACAC (equiv) | Lewis Acid (equiv) | Solvent (vol) | Temp. (℃) | Time (h) | BQO (% IPC) | AMC1 (% IPC) | Dimer (% IPC) | HQN Imp | Key Imp | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2.3 | 0.5 | 49.0 | 120.0 | 4.0 | 1.353281 | 55.156666 | 7.940756 | 5.656612 | 0.000000 | Optimal |
| 1 | 2.1 | 0.8 | 34.0 | 118.0 | 6.0 | 1.913893 | 56.423599 | 2.668903 | 6.290057 | 11.587929 | Optimal |
| 2 | 2.4 | 1.0 | 27.0 | 34.0 | 3.0 | 0.714154 | 51.085278 | 21.338291 | 3.851384 | 12.596950 | Optimal |
| 3 | 2.4 | 1.2 | 17.0 | 65.0 | 3.0 | 0.714154 | 51.085278 | 21.338291 | 3.851384 | 12.596950 | Optimal |
| 4 | 1.1 | 0.7 | 17.0 | 84.0 | 12.0 | 0.043860 | 43.066807 | 25.331287 | 4.780689 | 3.957733 | Optimal |
| 5 | 1.4 | 2.0 | 32.0 | 120.0 | 4.0 | 1.583246 | 56.348412 | 8.207374 | 6.527459 | 5.418442 | Uncertain |
| 6 | 1.5 | 2.0 | 50.0 | 36.0 | 12.0 | 1.497841 | 58.229034 | 4.846352 | 6.245393 | 12.059023 | Uncertain |
These are the suggested points from the algorithm, and they include a mix of optimal and uncertain points to test.
| BQO (% IPC) | BQO (% IPC) -stdev | AMC1 (% IPC) | AMC1 (% IPC) -stdev | Dimer (% IPC) | Dimer (% IPC) -stdev | HQN Imp | HQN Imp -stdev | Key Imp | Key Imp -stdev | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.353281 | 0.720668 | 55.156666 | 2.747704 | 7.940754 | 5.670068 | 5.656613 | 1.073428 | 0.000000 | 4.426401 |
| 1 | 1.913893 | 0.909842 | 56.423599 | 2.211352 | 2.668901 | 4.453705 | 6.290058 | 0.943162 | 11.587931 | 2.814724 |
| 2 | 0.714154 | 0.775063 | 51.085278 | 1.983932 | 21.338289 | 4.580088 | 3.851384 | 0.852979 | 12.596950 | 3.117950 |
| 3 | 0.714154 | 0.762962 | 51.085278 | 1.888568 | 21.338289 | 4.513475 | 3.851384 | 0.852665 | 12.596950 | 3.057258 |
| 4 | 0.043860 | 0.513532 | 43.066811 | 2.488346 | 25.331285 | 7.110478 | 4.780689 | 0.955404 | 3.957733 | 2.831276 |
| 5 | 1.583246 | 0.942538 | 56.348412 | 3.075012 | 8.207374 | 7.006069 | 6.527459 | 0.852663 | 5.418442 | 4.111855 |
| 6 | 1.497841 | 1.148828 | 58.229034 | 2.120657 | 4.846352 | 4.460425 | 6.245393 | 1.050397 | 12.059023 | 3.575100 |
The table above contains the predicted outputs and along with the associated uncertainty for each point, calcualted using the bootstrapped standard deviations. A smaller standard deviation indicates greater confidence in the model’s prediction, while a larger standard deviation signals the presence of variability associated with the prediction.
| Output Optimized | EtACAC (equiv) | Lewis Acid (equiv) | Solvent (vol) | Temp. (℃) | Time (h) | BQO (% IPC) | AMC1 (% IPC) | Dimer (% IPC) | HQN Imp | Key Imp | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BQO (% IPC) | 0.5 | 0.6 | 6.0 | 120.0 | 3.0 | 0.000000 | 39.298847 | 21.002005 | 2.625654 | 0.485350 |
| 1 | AMC1 (% IPC) | 2.0 | 2.0 | 50.0 | 105.0 | 12.0 | 2.411254 | 58.527691 | 0.444288 | 5.476033 | 12.269161 |
| 2 | Dimer (% IPC) | 2.6 | 0.7 | 42.0 | 114.0 | 12.0 | 1.570588 | 57.193531 | 0.115234 | 6.573816 | 4.419123 |
| 3 | HQN Imp | 0.5 | 1.7 | 6.0 | 119.0 | 3.0 | 0.000000 | 34.954605 | 21.507814 | 2.168596 | 10.947542 |
| 4 | Key Imp | 1.0 | 0.6 | 39.0 | 120.0 | 3.0 | 0.748022 | 46.258125 | 23.999989 | 5.076136 | 0.000000 |
The table above contains the optimal points for each output variable, optimized separately. It can be helpful if you would like to see the optimal values for the outputs without considering the multi-objective trade offs.