| title | Oracleagent |
|---|---|
| emoji | 🏆 |
| colorFrom | purple |
| colorTo | indigo |
| sdk | gradio |
| sdk_version | 6.14.0 |
| python_version | 3.10 |
| app_file | app.py |
| pinned | false |
| license | mit |
____ _ _ ___ _ _ _ _____ __
/ __ \| | | |/ _ \| | | | | / _ \ \ / /
| | | | | | | | | | | | | | | (_) \ V /
| | | | | | | | | | | | | | > _ < > <
| |__| | |_| | |_| | |_| | |___| (_) / . \
\____/ \___/ \___/ \___/|_____\___/_/ \_\
A cinematic command center for adaptive autonomous planning, probabilistic reasoning, and hazard survival.
"This is not just pathfinding. This is an autonomous tactical intelligence system."
- Overview
- Platform Strategy
- Architecture
- Agent Types
- Mathematical Foundations
- Experience Roadmap
- Quick Start
- Experiments
- Benchmarks
- Visualization
- Project Structure
- References
Oracle is an advanced built to feel like a futuristic AI command center. It operates in partially observable hazardous environments featuring volcanoes, water hazards, brick walls, and limited lives. The agent must reach the goal while balancing survival, uncertainty, and tactical decision-making.
This is not just a research demo. It is a foundation for an interactive experience with:
- cinematic grid-world simulation
- belief and uncertainty visualization
- interactive search and planning dashboards
- live reinforcement learning theater
- benchmark command centers
Oracle blends a unified decision architecture with a compelling interface-first vision:
| Component | Technique | Purpose |
|---|---|---|
| Search | Life-Aware A* | Optimal pathfinding with survival optimization |
| Perception | Bayesian Sensor Fusion | Probabilistic state estimation from noisy sensors |
| Information | Entropy-Based Scanning | Intelligent information gathering |
| Planning | Monte Carlo Tree Search | Simulation-based action evaluation |
| Learning | Tabular Q-Learning | Policy improvement through experience |
| Memory | Cross-Episode Priors | Transfer learning across environments |
┌─────────────────────────────────────────────────────────────────┐
│ ORACLE UNIFIED ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Sensors │───▶│ Belief │───▶ │ Decision │ │
│ │ (Thermal + │ │ Engine │ │ Engine │ │
│ │ Seismic) │ │ (Bayes) │ │ │ │
│ └─────────────┘ └─────────────┘ └──────┬──────┘ │
│ │ │
│ ┌────────────────────────────┼──────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────┐│
│ │ A* Planner │ │ MCTS Planner │ │ Q ││
│ │ (Determin.) │ │ (Simulation) │ │Table ││
│ └──────────────┘ └──────────────┘ └──────┘│
│ │ │ │ │
│ └────────────┬───────────────┘ │ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ Action │◀──────────────────┘ │
│ │ Selection │ (Epsilon-Greedy) │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ GridWorld │ │
│ │ (Physics) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Oracle is architected for a hybrid public experience:
- Primary Showcase: Vercel + Next.js + Three.js for a cinematic, interactive simulation.
- Research Playground: Hugging Face Spaces for benchmark reproducibility, configurable experiments, and academic showcase.
- Recruiter Magnet: Present Oracle as an Autonomous Survival Intelligence System with a polished command-center interface.
The ideal frontend pairs the existing engine with live features such as:
- animated grid navigation and fog-of-war
- Bayesian belief and uncertainty heatmaps
- MCTS rollout and search tree animation
- RL reward evolution and policy heatmaps
- benchmark command dashboards and challenge modes
See
docs/platform_strategy.mdfor the full hybrid deployment and showcase plan.
This repository should be presented as a cinematic AI experience, not a static algorithm demo. Key experience pillars include:
- Live Grid World: animated movement, hazard effects, zoom/pan controls, replay mode.
- AI Brain Visualization: belief updates, entropy maps, sensor confidence, decision heatmaps.
- MCTS Theater: branch expansion, UCB selection, rollout futures, explore/exploit visuals.
- RL Training Cinema: reward curves, Q-table heatmaps, emergent policy stories.
- Benchmark Command Center: success rates, reward histograms, efficiency metrics, animated comparisons.
- Cinematic Interface: glassmorphism, neon gradients, pulse scan effects, tactical overlays.
State Space: S = (row, col, lives)
Action Space: A = {walk_n, walk_s, walk_e, walk_w, jump_n, jump_s, jump_e, jump_w}
Objective: Minimize survival score
Survival Score = (Turns + Time Units) / Lives Remaining
The A* planner uses an admissible heuristic combining Manhattan distance, life penalties, and jump-optimized time estimates.
Belief State: b(s) = P(cell_type | sensor_history)
Sensors:
- Thermal: P(T+|Volcano) = 0.85, P(T+|Land) = 0.10
- Seismic: P(S+|Water) = 0.85, P(S+|Land) = 0.05
Information-Theoretic Scanning:
EIG(cell) = H(before) - E[H(after scan)]
The agent scans cells with highest Expected Information Gain before committing to risky moves.
State Encoding: (r, c, lives, risk_n, risk_s, risk_e, risk_w)
Q-Learning Update:
Q(s,a) ← Q(s,a) + α [r + γ max_a' Q(s',a') - Q(s,a)]
Exploration: Epsilon-greedy with exponential decay and Boltzmann softmax.
Uses Monte Carlo Tree Search with UCB1 selection:
UCB1 = Q̄(child) + c √(ln(parent_visits) / child_visits)
150 rollouts per decision with smart goal-biased rollout policy.
Given sensor reading z, update belief for cell type c:
P(c | z) = P(z | c) · P(c) / Σ_c' P(z | c') · P(c')
For thermal sensor T and seismic sensor S (conditionally independent):
P(c | T, S) ∝ P(T | c) · P(S | c) · P(c)
U(action) = Σ_s P(s) · [R(s) + γ · V*(s')]
Where utility components include:
- Goal reaching: +100
- Life preservation: +50 per life
- Time penalty: -1 per time unit
- Hazard penalty: -25 per hit
- Scan cost: -2 per scan
- Information gain: +5 per bit
Under the conditions:
- Σ α_t = ∞ (infinite exploration)
- Σ α_t² < ∞ (diminishing step sizes)
- All state-action pairs visited infinitely often
The Q-function converges to Q* with probability 1 (Watkins & Dayan, 1992).
conda env create -f config.yml
conda activate oracle-agentcd src
# Demo all agents
python main.py --mode all --seed 42
# Train RL agent
python main.py --mode train_rl --rl_episodes 3000
# Benchmark all agents
python main.py --mode benchmark --n_episodes 500 --rl_episodes 1000
# Demo with MCTS
python main.py --mode demo_bayesian --mcts- Use a single HF Space with the Gradio SDK.
- Select the Blank template.
- The repo root should contain
app.py,requirements.txt, andruntime.txt. - Push your GitHub repo, then import it into Hugging Face Spaces.
- HF will install dependencies from
requirements.txtand runpython app.py. - Recommended space name:
oracleagentororacle-agent. - Owner:
Sammy1808.
This repo also includes a Dockerfile for a custom Docker Space if you want full container control.
- Import the repo into Vercel.
- Set the project root to
frontend/. - Vercel will detect Next.js and build the site using
npm run build.
- This project is licensed under the MIT License.
- The license file is included in
LICENSE.
Hypothesis: Perfect information guarantees success; partial observability reduces success rate due to sensor noise.
Expected: Deterministic > Bayesian+MCTS > Bayesian > RL (untrained)
Hypothesis: Entropy-based scanning outperforms random scanning in partially observable settings.
Metric: Average steps to goal with fixed scan budget.
Training over 3000 episodes with ε-decay from 1.0 → 0.05.
Expected: Success rate increases from ~20% → ~80% over training.
Hypothesis: Agents with memory of hazard distributions learn faster in new environments.
Metric: First-episode success rate with/without memory initialization.
Example benchmark results (500 episodes, 5 seeds):
| Agent | Success Rate | Avg Reward | Avg Steps | Avg Lives | Avg Scans |
|---|---|---|---|---|---|
| Deterministic | 95.2% | 142.3 | 14.2 | 2.8 | 0 |
| Bayesian | 72.4% | 89.1 | 22.6 | 1.9 | 8.3 |
| Bayesian+MCTS | 78.1% | 98.7 | 20.1 | 2.1 | 7.1 |
| RL (trained) | 81.5% | 105.2 | 18.4 | 2.3 | 4.2 |
The current implementation produces publication-quality figures for training and benchmarking. The long-term goal is to extend this into a cinematic visualization layer with:
- live grid-world rendering and hazard animation
- Bayesian belief heatmaps and entropy overlays
- MCTS tree expansion and rollout futures
- RL reward cinema and emergent policy visuals
- benchmark dashboards with animated comparison charts
The system currently generates figures such as:
| Figure | Description |
|---|---|
rl_reward_curve.png |
Training reward with moving average |
rl_success_rate.png |
Success rate convergence |
belief_evolution.png |
Entropy reduction over episode |
benchmark_comparison.png |
Agent performance bar charts |
src/
├── config.py # Centralized hyperparameters
├── main.py # Unified CLI entry point
│
├── env/
│ └── grid_world.py # Environment dynamics & physics
│
├── belief/
│ └── bayesian_update.py # Probabilistic state estimation
│
├── planning/
│ ├── astar.py # Life-aware A* search
│ └── monte_carlo.py # MCTS with UCB1
│
├── agents/
│ ├── deterministic_agent.py
│ ├── bayesian_agent.py
│ └── rl_agent.py
│
├── learning/
│ └── q_learning.py # Tabular Q-learning engine
│
├── utils/
│ └── metrics.py # Benchmarking & logging
│
├── experiments/
│ └── benchmark.py # Full evaluation suite
│
└── visualize/
└── plots.py # Publication-quality plots
- Russell, S. & Norvig, P. Artificial Intelligence: A Modern Approach (4th Ed.). Pearson, 2020.
- Watkins, C.J.C.H. & Dayan, P. "Q-Learning." Machine Learning, 8(3), 1992.
- Kocsis, L. & Szepesvári, C. "Bandit Based Monte-Carlo Planning." ECML, 2006.
- Thrun, S. "Probabilistic Robotics." Communications of the ACM, 2002.
- Howard, R.A. "Information Value Theory." IEEE Transactions on Systems Science, 1966.