Nifty 50 Ensemble Directional Predictor

Market Regime Detection + Calibrated ML Ensemble for 1m & 5m Directional Prediction


Task	Binary classification: Nifty 50 Up/Down in next 1m or 5m
Architecture	HMM Regime Detector + Calibrated RF/GBM/XGB/LGB Ensemble
Features	50+ technical indicators (trend, momentum, volatility, volume, price lags)
Regimes	5 HMM states: Bull, Bear, Transitional, High-Vol, Low-Vol
Papers	HAELT (2025) arxiv:2506.13981 + HMM+NN Trading (2024) arxiv:2407.19858

Architecture

OHLCV Input
    |
    v
[Feature Engineering] -- 50+ technical indicators
    |
    +---> [HMM Regime Detector] -- 5 market states
    |                                  |
    v                                  v
[Feature Selection] -- MI-based      [Regime Labels]
    |
    v
[Calibrated Ensemble]
    - Random Forest (max_depth=4, class_weight='balanced')
    - Gradient Boosting (max_depth=3, subsample=0.6)
    - XGBoost (max_depth=3, heavy L1/L2 regularization)
    - LightGBM (max_depth=3, num_leaves=7)
    |
    v
[Dynamic Weighting] -- Soft vote mean of calibrated probabilities
    |
    v
Probability Up / Down

Performance (Current Dataset)

Data: Nifty 50 (^NSEI) from Yahoo Finance

1m: ~5,600 rows (Apr 15 - May 7, 2026) = ~22 trading days
5m: ~2,400 rows (Mar 16 - May 7, 2026) = ~52 trading days

Out-of-sample (20% holdout):

Horizon	Accuracy	F1	AUC	Brier
1m	49.2%	0.59	0.49	0.25
5m	52.0%	0.59	0.52	0.25

Note: Performance is near-random because ~~22 days of 1m data is insufficient for intraday prediction. The model needs 6+ months of 1m data (~~50k+ rows) for statistical edge. This repository provides the production-ready framework.

Usage

Inference

python inference.py --data nifty_1m.csv --model models_v2_1m --horizon 1 --output predictions.csv

Backtest

python backtest.py --data nifty_1m.csv --model models_v2_1m --threshold 0.55 --output backtest.csv

Python API

import pandas as pd
from nifty_ensemble_v2 import NiftyEnsembleV2

# Load model
pipe = NiftyEnsembleV2()
pipe.load("models_v2_1m")

# Predict on new data
df = pd.read_csv("nifty_1m.csv", index_col="Datetime", parse_dates=True)
results = pipe.predict(df)

# Results
df["proba_up"] = results["proba"][:, 1]
df["prediction"] = results["pred"]
df["regime"] = results["regime"]

Gradio Demo

pip install gradio
python app.py

Upload a CSV with columns: Datetime, Open, High, Low, Close, Volume

Data Requirements

For production-grade results, use 6-12 months of 1-minute OHLCV data:

Source	URL
Kaggle (Nifty 1m)	https://www.kaggle.com/datasets/debashis74017/nifty-50-minute-data
NSE Historical	https://www.niftyindices.com/reports/historical-data
Broker APIs	Zerodha Kite Connect, Upstox API

Recommended CSV format:

Datetime,Open,High,Low,Close,Volume
2026-04-15 09:15:00+05:30,24166.1,24266.7,24166.1,24232.9,0

Retraining

from nifty_ensemble_v2 import NiftyEnsembleV2

df = pd.read_csv("your_6months_1m_data.csv", index_col="Datetime", parse_dates=True)

pipe = NiftyEnsembleV2(max_features=25)
train_df, test_df = pipe.fit(df, horizon=1, test_size=0.2)
metrics = pipe.evaluate()  # Out-of-sample metrics
pipe.save("models_v2_1m_retrained")

References

HAELT (2025) - arxiv:2506.13981 - Hybrid Attention Ensemble for stock direction
HMM+NN Trading (2024) - arxiv:2407.19858 - Gaussian HMM regime detection + PyTorch NN
ta library for technical indicators: https://github.com/ta-lib/ta-lib-python

Disclaimer: This is for research and educational purposes only. Not financial advice. Past performance does not guarantee future results.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Prateek2001/nifty50-ensemble-predictor 1

Papers for Prateek2001/nifty50-ensemble-predictor

HAELT: A Hybrid Attentive Ensemble Learning Transformer Framework for High-Frequency Stock Price Forecasting

Paper • 2506.13981 • Published Jun 9, 2025

AI-Powered Energy Algorithmic Trading: Integrating Hidden Markov Models with Neural Networks

Paper • 2407.19858 • Published Jul 29, 2024