VoyagerSight is a research replication and extension of NVIDIA’s Voyager paper, integrating multimodal perception capabilities into autonomous Minecraft agents. Powered by modern LLMs with faster inference and enhanced multimodal capabilities, this implementation tests the hypothesis that agents with multimodal perception will perform more effectively and also display diverse behavior and creativity.

Technologies Used:


Full project documentation, code, and paper coming soon!