Nifty 50 Ensemble Directional Predictor
Market Regime Detection + Calibrated ML Ensemble for 1m & 5m Directional Prediction
| Task | Binary classification: Nifty 50 Up/Down in next 1m or 5m |
| Architecture | HMM Regime Detector + Calibrated RF/GBM/XGB/LGB Ensemble |
| Features | 50+ technical indicators (trend, momentum, volatility, volume, price lags) |
| Regimes | 5 HMM states: Bull, Bear, Transitional, High-Vol, Low-Vol |
| Papers | HAELT (2025) arxiv:2506.13981 + HMM+NN Trading (2024) arxiv:2407.19858 |
Architecture
OHLCV Input
|
v
[Feature Engineering] -- 50+ technical indicators
|
+---> [HMM Regime Detector] -- 5 market states
| |
v v
[Feature Selection] -- MI-based [Regime Labels]
|
v
[Calibrated Ensemble]
- Random Forest (max_depth=4, class_weight='balanced')
- Gradient Boosting (max_depth=3, subsample=0.6)
- XGBoost (max_depth=3, heavy L1/L2 regularization)
- LightGBM (max_depth=3, num_leaves=7)
|
v
[Dynamic Weighting] -- Soft vote mean of calibrated probabilities
|
v
Probability Up / Down
Performance (Current Dataset)
Data: Nifty 50 (^NSEI) from Yahoo Finance
- 1m: ~5,600 rows (Apr 15 - May 7, 2026) = ~22 trading days
- 5m: ~2,400 rows (Mar 16 - May 7, 2026) = ~52 trading days
Out-of-sample (20% holdout):
| Horizon | Accuracy | F1 | AUC | Brier |
|---|---|---|---|---|
| 1m | 49.2% | 0.59 | 0.49 | 0.25 |
| 5m | 52.0% | 0.59 | 0.52 | 0.25 |
Note: Performance is near-random because
22 days of 1m data is insufficient for intraday prediction. The model needs 6+ months of 1m data (50k+ rows) for statistical edge. This repository provides the production-ready framework.
Usage
Inference
python inference.py --data nifty_1m.csv --model models_v2_1m --horizon 1 --output predictions.csv
Backtest
python backtest.py --data nifty_1m.csv --model models_v2_1m --threshold 0.55 --output backtest.csv
Python API
import pandas as pd
from nifty_ensemble_v2 import NiftyEnsembleV2
# Load model
pipe = NiftyEnsembleV2()
pipe.load("models_v2_1m")
# Predict on new data
df = pd.read_csv("nifty_1m.csv", index_col="Datetime", parse_dates=True)
results = pipe.predict(df)
# Results
df["proba_up"] = results["proba"][:, 1]
df["prediction"] = results["pred"]
df["regime"] = results["regime"]
Gradio Demo
pip install gradio
python app.py
Upload a CSV with columns: Datetime, Open, High, Low, Close, Volume
Data Requirements
For production-grade results, use 6-12 months of 1-minute OHLCV data:
| Source | URL |
|---|---|
| Kaggle (Nifty 1m) | https://www.kaggle.com/datasets/debashis74017/nifty-50-minute-data |
| NSE Historical | https://www.niftyindices.com/reports/historical-data |
| Broker APIs | Zerodha Kite Connect, Upstox API |
Recommended CSV format:
Datetime,Open,High,Low,Close,Volume
2026-04-15 09:15:00+05:30,24166.1,24266.7,24166.1,24232.9,0
Retraining
from nifty_ensemble_v2 import NiftyEnsembleV2
df = pd.read_csv("your_6months_1m_data.csv", index_col="Datetime", parse_dates=True)
pipe = NiftyEnsembleV2(max_features=25)
train_df, test_df = pipe.fit(df, horizon=1, test_size=0.2)
metrics = pipe.evaluate() # Out-of-sample metrics
pipe.save("models_v2_1m_retrained")
References
- HAELT (2025) - arxiv:2506.13981 - Hybrid Attention Ensemble for stock direction
- HMM+NN Trading (2024) - arxiv:2407.19858 - Gaussian HMM regime detection + PyTorch NN
- ta library for technical indicators: https://github.com/ta-lib/ta-lib-python
Disclaimer: This is for research and educational purposes only. Not financial advice. Past performance does not guarantee future results.
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern