Data Analytics

The bridge between twenty-five years of operational wells engineering and quantitative methods that survive contact with a rig floor.

My positioning

I am not a data scientist learning wells. I am a wells authority learning data — and the distinction matters.

The most valuable analytical work in upstream is done by people who understand simultaneously what a model is mathematically doing and what the operational reality the model is supposed to inform looks like at 03:00 on a rig deck. That is a small intersection. It is the intersection I am positioning myself in.

The Imperial College Data Analytics certificate (May 2026) is the public proof point. The work below is what I have built inside it.

In progress — ESP Failure Predictive Analysis

ESP (Electric Submersible Pump) workovers represent approximately 73% of all workovers on the field I am currently working with. The average ESP working life sits around two years. A reliable failure-prediction model would enable proactive scheduling — pulling and replacing pumps on a planned-intervention basis rather than emergency response.

Approach: time-to-event modelling using Cox proportional hazards and random survival forests (lifelines and scikit-survival), with a binary-classification fallback at 3-, 6- and 12-month windows reusing the existing Random Forest / XGBoost framework.

Conditioning features: reservoir formation (Mishrif, Main Pay, Nahr Umr, Dammam, Upper Shale, 4th Pay), last workover date, ESP run-life history (cumulative + average), well age from spud, previous workover count, well function (producer/injector), production rate, geographic position, seasonality.

Censoring is handled explicitly — pumps still running at the end of the dataset window contribute to the likelihood as right-censored observations.

The model and supporting case-study page will be published here on completion. Target: late May 2026.

My toolchain

  • Python as the primary language (pandas, NumPy, scikit-learn, XGBoost, LightGBM, PuLP, lifelines, scikit-survival)
  • Visualisation: matplotlib, seaborn, Plotly
  • Editor: Visual Studio Code with Jupyter notebook integration
  • Data storage: SQLite via SQLAlchemy / pandas where it earns its place; otherwise flat files, by deliberate choice
  • Mixed-integer optimisation: PuLP with the CBC solver
  • Survival analysis: lifelines (Kaplan–Meier, Cox proportional hazards), scikit-survival (random survival forests)

Methodology IP statement

The methodologies, model architectures, feature-engineering choices, and analytical frameworks documented on this site are my own intellectual property and are publishable. Where these methods have been developed against confidential employer datasets, only headline outcomes already part of my publicly disclosed professional record are shared here. The underlying datasets are not.

Back to top