How the AI works
The maze is solved by imitation learning: an expert A* search computes optimal moves, and neural networks are trained to copy them. The result is a 5-network ensemble that votes (majority) on each step — verified to solve 500/500 fresh 21×21 mazes (100%).
Inputs → outputs
- Inputs (26): per-direction features — wall, goal proximity, visited, visit count, A* cost-to-go, and an is-optimal flag — plus the relative offset to the exit.
- Outputs: the next move (N/S/E/W).
How it was trained
- A* imitation: learn to mimic the optimal path on many random mazes.
- DAgger: let the network drive, then correct its mistakes with the expert — fixing the states it actually visits.
- Curriculum: train from small 5×5 mazes up to 21×21.
What you see on screen
The live network panel shows the ensemble's real activations, and pheromone-style trails glow on the cells of the chosen path.