🔮 ORACLE AGENT — Interactive AI Survival Simulation OS

title	Oracleagent
emoji	🏆
colorFrom	purple
colorTo	indigo
sdk	gradio
sdk_version	6.14.0
python_version	3.10
app_file	app.py
pinned	false
license	mit

   ____  _   _  ___  _   _ _     _____   __
  / __ \| | | |/ _ \| | | | |   / _ \ \ / /
 | |  | | | | | | | | | | | |  | (_) \ V / 
 | |  | | | | | | | | | | | |   > _ < > <  
 | |__| | |_| | |_| | |_| | |___| (_) / . \ 
  \____/ \___/ \___/ \___/|_____\___/_/ \_\

🔮 ORACLE AGENT — Interactive AI Survival Simulation OS

A cinematic command center for adaptive autonomous planning, probabilistic reasoning, and hazard survival.

"This is not just pathfinding. This is an autonomous tactical intelligence system."

Overview

Oracle is an advanced built to feel like a futuristic AI command center. It operates in partially observable hazardous environments featuring volcanoes, water hazards, brick walls, and limited lives. The agent must reach the goal while balancing survival, uncertainty, and tactical decision-making.

This is not just a research demo. It is a foundation for an interactive experience with:

cinematic grid-world simulation
belief and uncertainty visualization
interactive search and planning dashboards
live reinforcement learning theater
benchmark command centers

Oracle blends a unified decision architecture with a compelling interface-first vision:

Component	Technique	Purpose
Search	Life-Aware A*	Optimal pathfinding with survival optimization
Perception	Bayesian Sensor Fusion	Probabilistic state estimation from noisy sensors
Information	Entropy-Based Scanning	Intelligent information gathering
Planning	Monte Carlo Tree Search	Simulation-based action evaluation
Learning	Tabular Q-Learning	Policy improvement through experience
Memory	Cross-Episode Priors	Transfer learning across environments

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    ORACLE UNIFIED ARCHITECTURE                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌─────────────┐    ┌─────────────┐     ┌─────────────┐        │
│   │  Sensors    │───▶│   Belief    │───▶ │  Decision   │        │
│   │ (Thermal +  │    │   Engine    │     │   Engine    │        │
│   │  Seismic)   │    │  (Bayes)    │     │             │        │
│   └─────────────┘    └─────────────┘     └──────┬──────┘        │
│                                                 │               │
│                    ┌────────────────────────────┼──────────┐    │
│                    ▼                            ▼          ▼    │
│            ┌──────────────┐           ┌──────────────┐  ┌──────┐│
│            │  A* Planner  │           │ MCTS Planner │  │  Q   ││
│            │ (Determin.)  │           │ (Simulation) │  │Table ││
│            └──────────────┘           └──────────────┘  └──────┘│
│                    │                            │          │    │
│                    └────────────┬───────────────┘          │    │
│                                 ▼                          │    │
│                         ┌──────────────┐                   │    │
│                         │  Action      │◀──────────────────┘    │
│                         │  Selection   │  (Epsilon-Greedy)      │
│                         └──────────────┘                        │
│                                 │                               │
│                                 ▼                               │
│                         ┌──────────────┐                        │
│                         │   GridWorld  │                        │
│                         │   (Physics)  │                        │
│                         └──────────────┘                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Platform Strategy

Oracle is architected for a hybrid public experience:

Primary Showcase: Vercel + Next.js + Three.js for a cinematic, interactive simulation.
Research Playground: Hugging Face Spaces for benchmark reproducibility, configurable experiments, and academic showcase.
Recruiter Magnet: Present Oracle as an Autonomous Survival Intelligence System with a polished command-center interface.

The ideal frontend pairs the existing engine with live features such as:

animated grid navigation and fog-of-war
Bayesian belief and uncertainty heatmaps
MCTS rollout and search tree animation
RL reward evolution and policy heatmaps
benchmark command dashboards and challenge modes

See docs/platform_strategy.md for the full hybrid deployment and showcase plan.

Experience Roadmap

This repository should be presented as a cinematic AI experience, not a static algorithm demo. Key experience pillars include:

Live Grid World: animated movement, hazard effects, zoom/pan controls, replay mode.
AI Brain Visualization: belief updates, entropy maps, sensor confidence, decision heatmaps.
MCTS Theater: branch expansion, UCB selection, rollout futures, explore/exploit visuals.
RL Training Cinema: reward curves, Q-table heatmaps, emergent policy stories.
Benchmark Command Center: success rates, reward histograms, efficiency metrics, animated comparisons.
Cinematic Interface: glassmorphism, neon gradients, pulse scan effects, tactical overlays.

Agent Types

1. Deterministic Agent (Perfect Information)

State Space: S = (row, col, lives)
Action Space: A = {walk_n, walk_s, walk_e, walk_w, jump_n, jump_s, jump_e, jump_w}
Objective: Minimize survival score

Survival Score = (Turns + Time Units) / Lives Remaining

The A* planner uses an admissible heuristic combining Manhattan distance, life penalties, and jump-optimized time estimates.

2. Bayesian Agent (Partial Observability)

Belief State: b(s) = P(cell_type | sensor_history)
Sensors:

Thermal: P(T+|Volcano) = 0.85, P(T+|Land) = 0.10
Seismic: P(S+|Water) = 0.85, P(S+|Land) = 0.05

Information-Theoretic Scanning:

EIG(cell) = H(before) - E[H(after scan)]

The agent scans cells with highest Expected Information Gain before committing to risky moves.

3. RL Agent (Learning)

State Encoding: (r, c, lives, risk_n, risk_s, risk_e, risk_w)
Q-Learning Update:

Q(s,a) ← Q(s,a) + α [r + γ max_a' Q(s',a') - Q(s,a)]

Exploration: Epsilon-greedy with exponential decay and Boltzmann softmax.

4. Bayesian+MCTS Agent (Simulation Planning)

Uses Monte Carlo Tree Search with UCB1 selection:

UCB1 = Q̄(child) + c √(ln(parent_visits) / child_visits)

150 rollouts per decision with smart goal-biased rollout policy.

Mathematical Foundations

Bayesian Update

Given sensor reading z, update belief for cell type c:

P(c | z) = P(z | c) · P(c) / Σ_c' P(z | c') · P(c')

For thermal sensor T and seismic sensor S (conditionally independent):

P(c | T, S) ∝ P(T | c) · P(S | c) · P(c)

Expected Utility

U(action) = Σ_s P(s) · [R(s) + γ · V*(s')]

Where utility components include:

Goal reaching: +100
Life preservation: +50 per life
Time penalty: -1 per time unit
Hazard penalty: -25 per hit
Scan cost: -2 per scan
Information gain: +5 per bit

Q-Learning Convergence

Under the conditions:

Σ α_t = ∞ (infinite exploration)
Σ α_t² < ∞ (diminishing step sizes)
All state-action pairs visited infinitely often

The Q-function converges to Q* with probability 1 (Watkins & Dayan, 1992).

Quick Start

Prerequisites

conda env create -f config.yml
conda activate oracle-agent

Run Modes

cd src

# Demo all agents
python main.py --mode all --seed 42

# Train RL agent
python main.py --mode train_rl --rl_episodes 3000

# Benchmark all agents
python main.py --mode benchmark --n_episodes 500 --rl_episodes 1000

# Demo with MCTS
python main.py --mode demo_bayesian --mcts

Deployment

Hugging Face Space

Use a single HF Space with the Gradio SDK.
Select the Blank template.
The repo root should contain app.py, requirements.txt, and runtime.txt.
Push your GitHub repo, then import it into Hugging Face Spaces.
HF will install dependencies from requirements.txt and run python app.py.
Recommended space name: oracleagent or oracle-agent.
Owner: Sammy1808.

This repo also includes a Dockerfile for a custom Docker Space if you want full container control.

Vercel

Import the repo into Vercel.
Set the project root to frontend/.
Vercel will detect Next.js and build the site using npm run build.

License

This project is licensed under the MIT License.
The license file is included in LICENSE.

Experiments

Experiment 1: Deterministic vs Bayesian Success Rate

Hypothesis: Perfect information guarantees success; partial observability reduces success rate due to sensor noise.

Expected: Deterministic > Bayesian+MCTS > Bayesian > RL (untrained)

Experiment 2: Information Gain vs Random Scanning

Hypothesis: Entropy-based scanning outperforms random scanning in partially observable settings.

Metric: Average steps to goal with fixed scan budget.

Experiment 3: RL Convergence

Training over 3000 episodes with ε-decay from 1.0 → 0.05.

Expected: Success rate increases from ~20% → ~80% over training.

Experiment 4: Cross-Episode Memory

Hypothesis: Agents with memory of hazard distributions learn faster in new environments.

Metric: First-episode success rate with/without memory initialization.

Benchmarks

Example benchmark results (500 episodes, 5 seeds):

Agent	Success Rate	Avg Reward	Avg Steps	Avg Lives	Avg Scans
Deterministic	95.2%	142.3	14.2	2.8	0
Bayesian	72.4%	89.1	22.6	1.9	8.3
Bayesian+MCTS	78.1%	98.7	20.1	2.1	7.1
RL (trained)	81.5%	105.2	18.4	2.3	4.2

Visualization

The current implementation produces publication-quality figures for training and benchmarking. The long-term goal is to extend this into a cinematic visualization layer with:

live grid-world rendering and hazard animation
Bayesian belief heatmaps and entropy overlays
MCTS tree expansion and rollout futures
RL reward cinema and emergent policy visuals
benchmark dashboards with animated comparison charts

The system currently generates figures such as:

Figure	Description
`rl_reward_curve.png`	Training reward with moving average
`rl_success_rate.png`	Success rate convergence
`belief_evolution.png`	Entropy reduction over episode
`benchmark_comparison.png`	Agent performance bar charts

Project Structure

src/
├── config.py                 # Centralized hyperparameters
├── main.py                   # Unified CLI entry point
│
├── env/
│   └── grid_world.py         # Environment dynamics & physics
│
├── belief/
│   └── bayesian_update.py    # Probabilistic state estimation
│
├── planning/
│   ├── astar.py              # Life-aware A* search
│   └── monte_carlo.py        # MCTS with UCB1
│
├── agents/
│   ├── deterministic_agent.py
│   ├── bayesian_agent.py
│   └── rl_agent.py
│
├── learning/
│   └── q_learning.py         # Tabular Q-learning engine
│
├── utils/
│   └── metrics.py            # Benchmarking & logging
│
├── experiments/
│   └── benchmark.py          # Full evaluation suite
│
└── visualize/
    └── plots.py              # Publication-quality plots

References

Russell, S. & Norvig, P. Artificial Intelligence: A Modern Approach (4th Ed.). Pearson, 2020.
Watkins, C.J.C.H. & Dayan, P. "Q-Learning." Machine Learning, 8(3), 1992.
Kocsis, L. & Szepesvári, C. "Bandit Based Monte-Carlo Planning." ECML, 2006.
Thrun, S. "Probabilistic Robotics." Communications of the ACM, 2002.
Howard, R.A. "Information Value Theory." IEEE Transactions on Systems Science, 1966.

"The grid is dark and full of terrors. But Oracle has computed the way."

🌟 Star this repo if Oracle survives your first run.
🍴 Fork it to build your own planetary probe.
🐛 Open an issue if Oracle falls into the abyss.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
backend		backend
docs		docs
frontend		frontend
hf_space		hf_space
models		models
src		src
.gitignore		.gitignore
COMMANDS.md		COMMANDS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.yml		config.yml
main.py		main.py
requirements.txt		requirements.txt
runtime.txt		runtime.txt

Folders and files

Latest commit

History

Repository files navigation

🔮 ORACLE AGENT — Interactive AI Survival Simulation OS

📋 Table of Contents

Overview

Architecture

Platform Strategy

Experience Roadmap

Agent Types

1. Deterministic Agent (Perfect Information)

2. Bayesian Agent (Partial Observability)

3. RL Agent (Learning)

4. Bayesian+MCTS Agent (Simulation Planning)

Mathematical Foundations

Bayesian Update

Expected Utility

Q-Learning Convergence

Quick Start

Prerequisites

Run Modes

Deployment

Hugging Face Space

Vercel

License

Experiments

Experiment 1: Deterministic vs Bayesian Success Rate

Experiment 2: Information Gain vs Random Scanning

Experiment 3: RL Convergence

Experiment 4: Cross-Episode Memory

Benchmarks

Visualization

Project Structure

References

"The grid is dark and full of terrors. But Oracle has computed the way."

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages