Overview
In recent years, multimodal basic models (MFM), such as CLIP, ImageBind, DALL・E 3. GPT-4V, Gemini and Sora have become one of the most eye-catching and rapidly developing areas in the field of artificial intelligence. At the same time, the MFM open source community has also emerged with representative open source projects such as LLaVA, LAMM, MiniGPT-4, Stable Diffusion and OpenSora.
Different from traditional computer vision and natural language processing models, this type of MFM is actively exploring general problem solutions. By introducing MFM, Embodied Intelligence (EAI) can better handle various complex tasks in simulators and real-world environments. However, there are still many issues that have not yet been explored and solved in the intersection of MFM and EAI, including the agent's long-term decision-making, agent motion planning, new environment generalization capabilities, etc.
This Workshop will be dedicated to exploring several key issues, including but not limited to:
Workshop Call for Papers
This workshop focuses on Multimodal Basic Model (MFM), Embodied Intelligence (EAI) and the intersection of the two studies. Topics of this call for papers include but are not limited to:
Submission rules
This submission will be subject to double-blind review through the OpenReview platform. The length of the main text of the submission is 4 pages, and there is no limit on the length of references and supplementary materials.
Time nodes
All time nodes are [AoE] (Anywhere on Earth).
MFM-EAI ChallengeThree tracks (can participate at the same time)
EgoPlan Challenge is designed to evaluate multi-modal large models in real-world scenarios, targeting The ability to plan real-world tasks involved in daily human activities. The model needs to select reasonable actions to complete the task based on the task goal description, first-person perspective video and current environment observation.
Prize settings:
The Composable Generalization Challenge aims to evaluate the task capabilities and generalization capabilities of the planning-execution combined system in open scenarios. The model performs task decomposition based on language task description and multi-modal visual input, and the controller executes the decomposed subtasks.
The World Model Challenge aims to evaluate the application performance of world simulators in embodied intelligence scenarios. The model generates videos that comply with task instructions based on embodied task descriptions and real-time scene observations, and evaluates the quality of video generation and the ability to guide the agent to complete tasks.
Committee Members
Workshop Organizer
Steering CommitteeContact Workshop related questions icmlmfmeai@gmail.comThe above is the detailed content of The spark of large models and embodied intelligence, ICML 2024 MFM-EAI Workshop call for papers and challenge launched. For more information, please follow other related articles on the PHP Chinese website!