Nifty 50 Ensemble Directional Predictor

Market Regime Detection + Calibrated ML Ensemble for 1m & 5m Directional Prediction

Task Binary classification: Nifty 50 Up/Down in next 1m or 5m
Architecture HMM Regime Detector + Calibrated RF/GBM/XGB/LGB Ensemble
Features 50+ technical indicators (trend, momentum, volatility, volume, price lags)
Regimes 5 HMM states: Bull, Bear, Transitional, High-Vol, Low-Vol
Papers HAELT (2025) arxiv:2506.13981 + HMM+NN Trading (2024) arxiv:2407.19858

Architecture

OHLCV Input
    |
    v
[Feature Engineering] -- 50+ technical indicators
    |
    +---> [HMM Regime Detector] -- 5 market states
    |                                  |
    v                                  v
[Feature Selection] -- MI-based      [Regime Labels]
    |
    v
[Calibrated Ensemble]
    - Random Forest (max_depth=4, class_weight='balanced')
    - Gradient Boosting (max_depth=3, subsample=0.6)
    - XGBoost (max_depth=3, heavy L1/L2 regularization)
    - LightGBM (max_depth=3, num_leaves=7)
    |
    v
[Dynamic Weighting] -- Soft vote mean of calibrated probabilities
    |
    v
Probability Up / Down

Performance (Current Dataset)

Data: Nifty 50 (^NSEI) from Yahoo Finance

  • 1m: ~5,600 rows (Apr 15 - May 7, 2026) = ~22 trading days
  • 5m: ~2,400 rows (Mar 16 - May 7, 2026) = ~52 trading days

Out-of-sample (20% holdout):

Horizon Accuracy F1 AUC Brier
1m 49.2% 0.59 0.49 0.25
5m 52.0% 0.59 0.52 0.25

Note: Performance is near-random because 22 days of 1m data is insufficient for intraday prediction. The model needs 6+ months of 1m data (50k+ rows) for statistical edge. This repository provides the production-ready framework.


Usage

Inference

python inference.py --data nifty_1m.csv --model models_v2_1m --horizon 1 --output predictions.csv

Backtest

python backtest.py --data nifty_1m.csv --model models_v2_1m --threshold 0.55 --output backtest.csv

Python API

import pandas as pd
from nifty_ensemble_v2 import NiftyEnsembleV2

# Load model
pipe = NiftyEnsembleV2()
pipe.load("models_v2_1m")

# Predict on new data
df = pd.read_csv("nifty_1m.csv", index_col="Datetime", parse_dates=True)
results = pipe.predict(df)

# Results
df["proba_up"] = results["proba"][:, 1]
df["prediction"] = results["pred"]
df["regime"] = results["regime"]

Gradio Demo

pip install gradio
python app.py

Upload a CSV with columns: Datetime, Open, High, Low, Close, Volume


Data Requirements

For production-grade results, use 6-12 months of 1-minute OHLCV data:

Source URL
Kaggle (Nifty 1m) https://www.kaggle.com/datasets/debashis74017/nifty-50-minute-data
NSE Historical https://www.niftyindices.com/reports/historical-data
Broker APIs Zerodha Kite Connect, Upstox API

Recommended CSV format:

Datetime,Open,High,Low,Close,Volume
2026-04-15 09:15:00+05:30,24166.1,24266.7,24166.1,24232.9,0

Retraining

from nifty_ensemble_v2 import NiftyEnsembleV2

df = pd.read_csv("your_6months_1m_data.csv", index_col="Datetime", parse_dates=True)

pipe = NiftyEnsembleV2(max_features=25)
train_df, test_df = pipe.fit(df, horizon=1, test_size=0.2)
metrics = pipe.evaluate()  # Out-of-sample metrics
pipe.save("models_v2_1m_retrained")

References

  1. HAELT (2025) - arxiv:2506.13981 - Hybrid Attention Ensemble for stock direction
  2. HMM+NN Trading (2024) - arxiv:2407.19858 - Gaussian HMM regime detection + PyTorch NN
  3. ta library for technical indicators: https://github.com/ta-lib/ta-lib-python

Disclaimer: This is for research and educational purposes only. Not financial advice. Past performance does not guarantee future results.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Prateek2001/nifty50-ensemble-predictor 1

Papers for Prateek2001/nifty50-ensemble-predictor