This repository hosts machine learning project focused on analyzing and predicting sonic well logs from petrophysical data:
-
๐ Ensemble vs. Standalone Models
Comparison between ensemble methods (e.g., Random Forest, Gradient Boosting) and standalone regression models. -
๐ง Binary Prediction for Petrophysical Sonic Well Logs (DTC-DTS)
Predicts compressional (DTC) and shear (DTS) sonic logs from other petrophysical measurements in the Volve field.
๐ This README covers the Binary Prediction Project in detail.
Sonic logs are essential in petroleum engineering for:
- Identifying lithology and porosity
- Mapping fluid types and natural fractures
- Ensuring wellbore stability
- Seismic calibration for hydrocarbon recovery
This project utilizes machine learning (especially ExtraTreesRegressor
) to predict DTC and DTS logs using the Volve dataset and petrolib
.
๐ Goal: Predict DTC and DTS from logs like GR, RT, RHOB, NPHI, etc., for enhanced subsurface characterization in the Volve field.
- ๐ Source: Volve field (7 wells)
- ๐ Source: Midland Basin (2 sections)
- ๐ฆ Format: LAS files via
petrolib
- ๐งฌ Features:
- Gamma Ray (GR)
- Resistivity (RT)
- Density (RHOB)
- Neutron Porosity Index (NPHI)
- Compressional (DTC) & Shear Sonic (DTS)
- ๐ข Size: ~13,000 samples ร 20 variables
Step | Description |
---|---|
โ Filtering | Logs filtered by depth range |
๐ซ Missing Values | Imputation or row-wise removal |
๐จ Outliers | Detected and managed |
๐ Standardization | Features scaled for uniformity |
๐ Correlation Analysis | Performed pre- and post-cleaning to check multicollinearity |
- ๐๏ธ Data Handling: Read LAS files into Pandas using
petrolib
- ๐ Preprocessing: Depth filtering, NaN handling, outlier removal, standardization
- ๐ง Model:
ExtraTreesRegressor
(chosen for performance) - โ๏ธ Tuning: Hyperparameter optimization using
Optuna
- ๐ Metrics:
- Rยฒ Score
- MSE, RMSE
- MAE
Standalone Models:
- Linear Regression
- Partial Least Squares
Ensemble Models:
- โ ExtraTreesRegressor (final model)
- Random Forest
- Gradient Boosted Trees
Model | Rยฒ | MSE | RMSE | MAE |
---|---|---|---|---|
Linear Regression | 0.312 | 0.145 | 0.381 | 0.276 |
ElasticNet Regression | 0.346 | 0.138 | 0.371 | 0.265 |
PLS Regression | 0.357 | 0.135 | 0.367 | 0.269 |
SVR | 0.284 | 0.151 | 0.389 | 0.285 |
Random Forest | 0.892 | 0.022 | 0.148 | 0.104 |
ExtraTrees | 0.922 | 0.016 | 0.126 | 0.089 |
Gradient Boosted Trees | 0.874 | 0.026 | 0.161 | 0.112 |
Model | Rยฒ | MSE | RMSE | MAE |
---|---|---|---|---|
Linear Regression | 0.326 | 0.142 | 0.377 | 0.273 |
ElasticNet Regression | 0.359 | 0.135 | 0.367 | 0.262 |
PLS Regression | 0.371 | 0.132 | 0.363 | 0.266 |
SVR | 0.298 | 0.148 | 0.385 | 0.282 |
Random Forest | 0.905 | 0.020 | 0.141 | 0.100 |
ExtraTrees | 0.936 | 0.013 | 0.114 | 0.081 |
Gradient Boosted Trees | 0.887 | 0.024 | 0.155 | 0.107 |
๐ก Insight:
ExtraTreesRegressor
achieved 93.6% Rยฒ (no outliers), confirming its robustness.
- ๐ Correlation heatmaps (pre & post preprocessing)
- ๐ Pair plots of key features
- ๐ DTC & DTS prediction plots vs. actual (see
binary_prediction/results/figures/
)
- โ Ensemble models outperform standalone for sonic log prediction
- โ Preprocessing improves accuracy (outlier removal + scaling)
- โ High predictive performance (Rยฒ > 0.92) on test data
- ๐ญ Future Work: Explore DL models (e.g., LSTM, CNNs) and expand feature sets
-
Python 3.8+
-
Key Libraries:
-
pandas, numpy, scikit-learn, petrolib, optuna, matplotlib
-
(see binary_prediction/requirements.txt)
- Distributed under the MIT License.