Following the launch of Rabbit, an AI device that can use mobile apps, the term large action models (LAMs) is getting popular. These models move beyond conversation by turning LLMs into “agents” that can connect the siloed, app-driven world without requiring users to click on apps or integrate APIs.
The line between hype and reality of LAMs is blurry but in short: LAM is a large language model (LLM) specifically trained to take actions (e.g. send API requests).1
What is a large action model (LAM)?
A Large Action Model (LAM) is an advanced type of AI that builds on Large Language Models (LLMs) by not only understanding and generating text but also planning and executing actions in real-world (digital or physical) environments, enabling it to automate tasks and interact directly with systems based on user intent.
Key characteristics of Large Action Models (LAMs) include their ability to understand user intent from different inputs (text, voice, images), turn that intent into executable actions, plan and adapt tasks step by step in changing environments, and operate efficiently through specialization in specific domains, enabling them to complete complex real-world tasks autonomously.
Key characteristics of Large Action Models (LAMs) include:
- Interpreting user intent: They can understand user requests from text, voice, images, or videos, even when the instruction is unclear or implicit.
- Generating actions: They turn user goals into concrete actions in digital or physical environments, such as using a GUI, calling APIs, controlling robots, or generating code.
- Dynamic planning and adaptation: They can break complex tasks into smaller steps, follow a plan, and adjust it when the situation changes or errors occur.
- Specialization and efficiency: They are often built for specific tasks or environments, which makes them more accurate and efficient than general-purpose models in that domain.
In short, LAMs do more than understand language. They connect understanding with action and can carry out multi-step tasks in real-world settings.
How do large action models (LAM) work?
LAMs interact with applications via their user interfaces or more commonly via APIs. For example, they can process the images and code of a website or application to decide their next steps and perform actions.
This allows LAMs to navigate user and application interfaces. For example, if the information exists already or is accessible through another app, it will retrieve it from that app rather than asking the user.
Within LAMs, such degrees of autonomy and comprehension transform generative AI into an active assistant that can perform tasks such as:
- administering social media platforms
- getting weather information
- making reservations
- processing financial transactions
- connect to IoT devices to allow you to send commands to them (e.g. calling an Uber)
Source: Salesforce2
LAMs and LLMs: Understanding the difference
Source: Large Action Models: From Inception to Implementation3
Large Action Models (LAMs) extend Large Language Models (LLMs) by not only understanding user requests but also planning and executing real-world actions, such as completing tasks on websites, making them more efficient, task-focused, and practical for real-world applications, often with smaller and more specialized designs.
Though LAMs and large language models share some similarities, like their ability to grasp human intentions, their core purposes differ greatly.
LAMs are designed to take action, whereas LLMs excel in processing and generating language. While an LLM might suggest ideas or generate text based on your input, a LAM takes it a step further by autonomously performing tasks like making appointments, ordering products, or filling out forms.
Large agentic models (LAM) hype or real?
While some companies portray LAMs as a new architecture, the functionalities assigned to them have been implemented for some time using LLM agents.4
Additionally, LLM agents have previously been performing tasks that LAMs are described to do. The two concepts share common functionalities (see figure):
- Context-based analysis
- Prompt engineering
- Leveraging tools
- Reasoning5
Figure: Language-based AI agent workflow
Source: ICLR6
Furthermore, LAMs can be described as language-based agent designs such as (1) prompt template-based AI agents; (2) learnable prompt AI agents; and (3) large action models (LAMs); stating that we can think of a LAM as a LLM specifically trained to execute human actions from data.7
For more details on AI models see our data-driven research on:
Real-life LAM examples
1. Automatically completing forms or spreadsheets on websites
A LAM can recognize the needed fields on a form, gather the required data (e.g. addresses, names, passwords, and credit card numbers) from a database or user profile, and enter it into the proper fields.
Video: Automatically completing forms or spreadsheets with LAM
2. Completing online transactions
A LAM can work with buttons, links, and dropdown menus. It may also insert specific text into text fields and search bars. This is precisely what ordering pizza online entails: filling out text forms, clicking buttons, and selecting menu selections.
Video: HyperWriteAI Assistant Studio using the browser to place an online order
Source: HyperWriteAI9
3. Resolving customer service requests end-to-end
A Large Action Model (LAM) can handle a full customer request from start to finish by understanding the user’s goal, deciding the necessary steps, and executing them across multiple systems (such as CRM, billing, and support platforms).
The Genesys Cloud Agentic Virtual Agent is an example of this use case: it can understand a customer’s issue (e.g., a billing problem), determine what needs to be done, and complete the required actions, such as checking account data, updating records, or triggering service processes, without human intervention.10
Instead of only providing answers, the system completes the task itself by interacting with different tools and workflows, reducing the need for repeated explanations or manual follow-ups.
4. Autonomous driving and decision-making
A Large Action Model (LAM) can power autonomous systems by interpreting real-world inputs, reasoning about situations, and executing actions in real time.
NVIDIA’s Alpamayo uses Vision-Language-Action models to process camera video, understand the driving environment, reason about what is happening, and generate driving actions such as steering, braking, or accelerating.11
Instead of following fixed rules, the system decides what to do based on context (e.g., traffic, obstacles, road conditions) and explains its reasoning, enabling safer and more transparent autonomous driving.
5. Personal task execution across everyday apps
A Large Action Model (LAM) can turn a user’s goal into concrete actions across multiple tools, completing tasks without step-by-step instructions. For example, agentic AI systems like OpenClaw use similar principles: they can manage emails, calendars, and travel bookings by planning steps and executing them autonomously. While OpenClaw represents a full agentic AI system, LAMs provide the action-taking core that enables such systems to carry out multi-step workflows reliably.
Technologies in LAMs
A LAM may utilize the following techniques:
- Connections: Connect to several apps and APIs.
- Neuro-symbolic approach: Neuro symbolic programming is a method that allows LAMs to combine neural networks trained on large data sets with built-in symbolic logical reasoning capabilities. This enables them to notice patterns while also comprehending the underlying reasoning, making them more adaptive and capable of taking meaningful responses depending on the “why” of user requests.
- Instruction abstraction: Create instructions that provide modular, and hierarchical abstraction for modeling via an interface.
- Direct human modeling: Identify a user intent, habits, and routines, across applications to develop a template for acting.
- Task reasoning: Analyze the relationships between tasks, identifying dependencies and determining the optimal order of execution. It ensures that prerequisite tasks are completed before dependent ones begin. This enables the LAM to improve workflows based on past interactions.
- Continuous learning: LAMs not only perform task execution but also improve their performance over time through continuous learning. For example, LAM could manage customer inquiries about orders, returns, and product information. Over time, it would become more adept at resolving issues quickly, even predicting and addressing potential problems before customers reach out.
Large action model examples
The term LAM covers a mix of consumer products, action-focused models, and research systems that try to turn user intent into software actions.
- Rabbit R1: Rabbit markets the R1 around its LAM idea, and its official materials now point users to features such as LAM Playground and teach mode for website tasks. At the same time, early reviews were highly critical; The Verge called the device “unfinished” and “unhelpful,” and said there was little evidence of a LAM working reliably in the product at launch.
- Adept ACT-1: Adept described ACT-1 as a “foundation model for actions” trained to use software tools, APIs, and web apps. It is best understood as an advanced action-oriented agent system, rather than a fully separate AI category on its own.
- Salesforce xLAM: Salesforce released xLAM as a family of models optimized for function calling and AI agents, and later expanded it with stronger multi-turn support. This makes xLAM one of the clearest official examples of a LAM-style model family.
- Microsoft TaskMatrix.AI: TaskMatrix.AI is a Microsoft Research vision paper that proposes connecting foundation models with millions of APIs to complete tasks. Because it is framed as a research vision and position paper, it is better described as an academic LAM-like framework than a deployable product.
Reference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.