Generative AI (GenAI) presents novel opportunities for enterprises compared to middle-market companies or startups, including:
- Enterprise generative AI use cases.
- The opportunity to build your company’s models without exposing private data to 3rd parties.
However, generative AI brings challenges unique to large organizations. For example:
- 36% of enterprises cite concerns about proprietary data exposure when using commercial LLMs.1
- GenAI will also accelerate new services and solutions, allowing competitors to enter markets faster and capture share.
- Automation powered by generative models can improve customer experience or reduce costs, but it may also introduce operational and reputational risks through AI bias or hallucinations.
Explore our practical enterprise AI use cases to learn how large companies can build, deploy, and govern their own generative AI models effectively.
Enterprise generative artificial intelligence use cases
The web is full of B2C use cases such as writing emails with generative AI support that don’t require deep integration or specialized models. However, enterprise value of generative AI comes from enterprise AI applications listed below:
Common use cases
Enterprise Knowledge Management(EKM): While SMEs and mid-market firms do not have challenges in organizing their limited data, Fortune 500 or Global Forbes 2000 need enterprise knowledge management tools for numerous use cases. Generative AI can serve them. Applications include:
- Insight extraction by tagging unstructured data like documents.
- Summarization of unstructured data.
- Enterprise search which goes further than keyword search takinginto account relationships between words.
Part of enterprise search includes answering employee questions about:
- Company’s practices (e.g. HR policies)
- Internal company data like sales forecasts
- A combination of internal and external data. For instance: How would potential future sanctions targeting MLOps systems sales to our 3rd largest geographic market affect our corporate performance?
Larger organizations serve global customers and machine translation ability of LLMs are valuable in use cases like:
- Website localization
- Creating documentation like technical manuals at scale for all geographies
- Multilingual customer service
- Social media listening targeting a global audience
- Multilingual sentiment analysis
Industry specific applications
Most enterprise value is likely to come from using generative AI technologies for innovation in companies’ specific industries: This could be in the form of new products and services or new ways of working (e.g. process improvement with GenAI). Below lists of generative AI applications can serve as starting points:
How should enterprises leverage generative AI?
We charted a detailed path for businesses to leverage generative AI. While most firms may not need to build their models, most large enterprises (i.e. Forbes Global 2000) are expected to build or optimize one or more generative AI models specific to their business requirements within the next few years. Finetuning can enable businesses to achieve these goals:
- Achieve higher accuracy by customizing model output in detail for their own domain
- Save costs. Customizable models with licenses permitting commercial use have been measured to be almost as accurate as proprietary models at significantly lower cost.2
- Reduce attack surface for their confidential data
Firms like Bloomberg are generating world-class performance by building their own generative AI tools leveraging internal data. 3
What are the guidelines for enterprise AI models?
At a minimum an enterprise generative AI model should be:
Trusted
Consistent
Most current LLMs can provide different outputs for the same input. This limits the reproducibility of testing which can lead to releasing models that are not sufficiently tested.
Controlled
Enterprises should host or integrate generative AI in environments where they can manage security and compliance at a granular level (e.g., on-premises or dedicated cloud instances). The alternative is using online chat interfaces or APIs like OpenAI’s LLM APIs.
The disadvantage of relying on APIs is that the user may need to expose confidential proprietary data to the API owner. This increases the attack surface for proprietary data. Global leaders like Amazon and Samsung experienced data leaks of internal documents and valuable source code when their employees used ChatGPT.4 5
Since then, enterprise offerings have matured significantly:
- OpenAI Enterprise (2023) and, later, the ChatGPT Team (2024) introduced zero data retention, SOC 2 compliance, SSO/SAML integration, and admin controls.6
- Major providers (e.g., Anthropic, Microsoft, Google, Cohere) now advertise customer data opt-outs, meaning user prompts and outputs are not used for model training.
- Providers have also begun aligning with EU AI Act (2024) requirements, which emphasize responsible AI principles like transparency, auditability, and risk management in high-risk AI systems.
Despite these advances, residual risks remain when relying on third-party cloud systems:
- Malicious insiders or compromised providers could still access enterprise data.
- API misconfigurations can expose sensitive data flows.
- Lack of explainability in LLMs continues to challenge compliance teams.
For highly regulated industries, self-hosting or private deployment of foundation models (via open-weight models like LLaMA-4, Mistral, or Granite) remains the most secure approach, though at higher operational cost.
Explainable
Unfortunately, most generative AI models are not capable of explaining why they provide certain outputs. This limits their use as enterprise users that would like to base important decision making on AI powered assistants would like to know the data that drove such decisions. XAI for LLMs is still an area of research.
Reliable
Hallucination (i.e. making up falsehoods) is a feature of LLMs and it is unlikely to be completely resolved. Enterprise genAI systems require the necessary processes and guardrails to ensure that harmful hallucinations are minimized or detected or identified by humans before they can harm enterprise operations.
Enterprises increasingly rely on retrieval-augmented generation (RAG) pipelines to reduce hallucinations by grounding models in trusted data. Yet challenges remain in infrastructure, storage, and security, making RAG not just a fix but a long-term enterprise requirement.7
Secure
Enterprise-wide models may have interfaces for external users. Bad actors can use techniques like prompt injection to have the model perform unintended actions or share confidential data.
Ethical
Ethically trained
Model should be trained on ethically sourced data where Intellectual Property (IP) belongs to the enterprise or its supplier and personal data is used with consent.
- Generative AI IP issues, such as training data that includes copyrighted content where the copyright doesn’t belong to the model owner, can lead to unusable models and legal processes.
- Use of personal information in training models can lead to compliance issues. For example, OpenAI’s ChatGPT needed to be disclose its data collection policies and allow users to remove their data after the Italian Data Protection Authority (Garante)’s concerns.8
Read generative AI copyright issues and solutions to learn more.
Fair
For enterprises, unfair models can cause several risks:
- Regulatory risk: AI systems used in hiring, lending, insurance, or healthcare may violate anti-discrimination laws if they produce biased outcomes.
- Operational risk: Biased outputs can degrade decision quality, such as recommending unsuitable candidates or misclassifying customer segments.
- Reputational risk: Public exposure of biased AI behavior can damage brand trust and customer relationships.
- Market limitations: Models trained primarily on one geography, language, or demographic group may perform poorly in global markets.
How enterprises address fairness
Enterprises address fairness in AI through a combination of governance practices and technical safeguards:
- They curate diverse and representative training datasets and remove sensitive attributes or proxy variables that could introduce bias.
- Models are evaluated using fairness metrics (e.g., demographic parity or equal opportunity) and tested on edge cases to identify potential disparities.
- Organizations also incorporate human oversight, such as human-in-the-loop validation for high-impact decisions and AI ethics review boards.
- Enterprises continuously monitor model outputs in production to detect biased patterns and retrain models as new or more balanced data becomes available.
Licensed
The enterprise need to have a commercial license to use the model. For example using models like Meta’s LLaMa have noncommercial licenses preventing their legal use in most use cases in a for-profit enterprise. Models with permissive licenses like Vicuna built on top of LLaMa also end up having noncommercial licenses since they leverage the LLaMa model.9 10
Sustainable
Training generative AI models from scratch is expensive and energy-intensive, contributing to carbon emissions. Business leaders should be aware of the full cost of generative AI technology and identify ways to minimize its ecological and financial costs.
Enterprises can strive towards most of these guidelines and they exist on a continuum except the issues of licensing, ethical concerns and control.
- It is clear how to achieve correct licensing and to avoid ethical concerns but these are hard goals to achieve
- Achieving control requires firms to build their own foundation models however most businesses are not clear about how to achieve this
How can enterprises build foundation models?
There are 2 approaches to build your firms’ LLM infrastructure on a controlled environment.
1- Build Your Own Model (BYOM)
This approach allows world-class performance costing a few million $ including computing (1.3M GPU hours on 40GB A100 GPUs in case of BloombergGPT) and data science team costs.11
BYOM is primarily pursued by enterprises in highly regulated sectors (e.g., finance, healthcare, defense) where data sensitivity and compliance requirements outweigh the costs. Some firms follow a hybrid approach by training smaller domain-specific models while leveraging external foundation models for general-purpose reasoning.
2- Improve an existing model
Most enterprises adopt this approach due to its cost efficiency and flexibility. Several methods are available:
2.1- Fine-tuning
It is a cheaper machine learning technique for improving the performance of pre-trained large language models (LLMs) using selected datasets.
Instruction fine-tuning was previously done with large datasets but now it can be achieved with a small dataset (e.g. 1,000 curated prompts and responses in case of LIMA).12 The importance of a robust data collection approach optimizing data quality and quantity is highlighted in early commercial LLM fine-tuning experiments.13
Compute costs in research papers have been as low $100 while achieving close to world-class performance.14
Model fine-tuning is an emerging with domain with new approaches like Inference-Time Intervention (ITI), an approach to reduce model hallucinations, being published every week.15
2.2- Reinforcement Learning from Human Feedback (RLHF)
A fine-tuned model can be further improved by human in the loop assessment. 16 17
2.3- Retrieval augmented generation (RAG)
RAG allows businesses to pass crucial information to models during generation time. Models can use this information to produce more accurate responses.
Contemporary frameworks such as LangChain and LlamaIndex facilitate secure integration of structured and unstructured enterprise data. Advanced RAG methods now include multi-hop retrieval and real-time search integration, further enhancing reliability and factual accuracy.
Enterprises are moving toward auto-grounding, where models connect to live data sources automatically to keep outputs current. Cloud providers like Azure now frame RAG as the core architecture for copilots, knowledge systems, and customer apps, prioritizing scalability and security.18
Given the high costs involved in BYOM, we recommend businesses to initially use optimized versions of existing models. Language model optimization an emerging domain with new approaches being developed on a weekly basis. Therefore businesses should be open to experimentation and be ready to change their approach.
Top cost-effective foundation models for enterprises
Machine learning platforms released foundation models with commercial licenses relying mostly on text on the internet as the primary data source. These models can be used as base models to build enterprise large language models:
OpenAI GPT-5
GPT-5.4 is OpenAI’s latest frontier model designed for professional and complex knowledge work. Capabilities include:
- Advanced reasoning & knowledge work: Produces high-quality outputs for tasks like reports, spreadsheets, presentations, and analysis across many professional domains.
- Coding ability: Integrates the coding strengths of GPT-5.3-Codex, enabling production-quality code generation and multi-file software changes.
- Agentic workflows & tool use: Can search and select tools, automate multi-step workflows, and execute long tasks more reliably.
- Native computer-use capability: Agents can interact with software using screenshots, mouse/keyboard actions, or automation code to complete tasks across apps and websites.
- Large context window: Supports up to 1 million tokens, allowing analysis of large codebases, long documents, or extended workflows in a single prompt.
GPT-5.4 shows strong improvements across several benchmarks. It achieves 83% wins/ties on GDPval for knowledge-work tasks (up from 70.9% in GPT-5.2). In software engineering, it scores 57.7% on SWE-Bench Pro, indicating solid coding performance. For computer-use tasks, it reaches 75% on OSWorld-Verified, surpassing the human baseline of 72.4%.
It also performs well on web research with 82.7% on BrowseComp.
Additionally, GPT-5.4’s responses are 33% less likely to be false and 18% less likely to contain errors compared with GPT-5.2.19
GPT-5.3-Codex is OpenAI’s agentic coding model, combining the advanced software engineering capabilities of GPT-5.2-Codex with the broader reasoning and professional knowledge of GPT-5.2.
The model manages complex development workflows, such as research, multi-step tool use, and long-running coding tasks, across large codebases.
Figure 1: An example of a prompt to slide generation using GPT-5.3-Codex.20
DeepSeek
DeepSeek-V3 by DeepSeek is an MoE model (~671B, MIT-licensed) with strong reasoning and coding performance and has been open-source since March 2025. 21
DeepSeek-V3.1 by DeepSeek (Aug 2025) extends long-context capabilities with an updated tokenizer and open weights. 22
Google DeepMind
Gemini 3.1 Pro (Google DeepMind) is a frontier large language model designed for complex reasoning, coding, and multimodal tasks, capable of processing information across text, images, audio, video, code, and documents.
On several benchmarks, Gemini Pro demonstrates strong performance across reasoning, coding, and multimodal tasks. It achieves 77.1% on ARC-AGI-2 for abstract reasoning and 94.3% on GPQA Diamond for graduate-level science questions. On Humanity’s Last Exam, which measures academic reasoning, it scores 44.4% without tools.
For coding and software engineering, the model reaches 68.5% on Terminal-Bench 2.0 and 80.6% on SWE-Bench Verified. It also performs well on knowledge and multimodal benchmarks, scoring 92.6% on MMMLU (multilingual knowledge) and about 80.5% on MMMU-Pro (multimodal reasoning).23
Meta LLaMA
LaMA 4 by Meta is released as LLaMA 4 Maverick, Scout and a Behemoth preview. These models are natively multimodal (text and vision), support context windows up to 10 million tokens, and remain optimized for efficiency. 24
Llama 3 by Meta was the former model with a commercial use license with some limitations for very large businesses. 25
Mistral AI
Mistral 8x22B is the latest open-weights model developed by the European generative AI startup Mistral. With its permissive license (i.e. Apache 2.0) that allows commercial use without specific restrictions for large businesses, it can be attractive for all businesses.26 Mistral also provides models like Mistral Large but that model has more restrictive licensing.27
Recently, Mistral has expanded its lineup to include models such as Mistral Large 3; smaller models like Mistral Small and Medium; specialized coding models such as Codestral and Devstral; and audio models like Voxtral Transcribe 2, which provides batch and real-time speech transcription capabilities.28
IBM
IBM’s Granite models are high performing according to code generation benchmarks and are available with the permissive Apache 2.0 license.29
The Granite ecosystem has also expanded to include speech models, such as Granite-4.0-1B-Speech, that support multilingual speech recognition and translation.30
Databricks
DBRX is an open-weights model developed by the data platform Databricks. It comes with a commercial license with similar limitations to Meta’s models. Limitations apply to businesses serving more than 700M active users. 31
Grok
Grok-4 by xAI was released in July 2025 with native tool use, real-time search integration, and a “Heavy” variant for advanced reasoning. Grok 4.1 was rolled out in November 2025, improving reasoning, coherence, personality/emotional nuance, and reducing hallucinations compared to Grok 4.32
xAI recently introduced Grok 4.20 Beta, which adds multi-agent capabilities, enabling coordinated task execution across multiple specialized agents. Meanwhile, Grok 5 has been reported to be in training, suggesting further advances in reasoning and agentic capabilities are under development.33
Explore the up-to-date benchmark and pricing details of the foundation models for enterprise genAI applications:
What is the right tech stack for building large language models?
Generative AI is an artificial intelligence technology and large businesses have been building AI solutions for the past decade. Experience has shown that leveraging Machine Learning Operations (MLOps) platforms significantly accelerate model development efforts.
In addition to their MLOps platforms, enterprise organizations can rely on a growing list of Large Language Model Operations (LLMOps) tools and frameworks like Langchain, Semantic Kernel or watsonx.ai to customize and build their models, AI risk management tools like Nemo Guardrails.
In early days of new technologies, we recommend executives to prioritize open platforms to build future-proof systems. In emerging technologies, vendor lock-in is an important risk. Businesses can get stuck with outdated systems as rapid and seismic technology changes take place.
Finally, data infrastructure of a firm is among the most important underlying technologies for generative AI:
Vast amounts of internal data need to be organized, formatted.
Data quality and observability efforts should ensure that firms have access to high quality, unique, easily-usable datasets with clear metadata.
Synthetic data capabilities may be necessary for model training
How to evaluate large models’ performance?
Without measurement of effectiveness, the value of generative AI efforts can not be quantified. However, LLM evaluation is a difficult problem due to issues in benchmark datasets, benchmarks seeping into training data, inconsistency of human reviews and other factors.34 35 .
We recommend an iterative approach that increases investment in evaluation as models get closer to be used in production:
- Use benchmark test scores to prepare shortlists. This is available publicly for a large number of open source models.36 37
- Rely on Elo scores used in ranking players in zero-sum games like chess, compare the models to be selected. If there are higher performing models which are not available to be used (e.g. due to licensing or data security issues), they can be used to compare the responses of different models. 38
Figure 2: Few-shot learning improvement from OpenAI.
This can also include chain-of-thought prompting. Chain-of-thought prompting is a prompt engineering technique that guides a language model to reason through a problem step by step before producing a final answer. By generating intermediate reasoning steps, the model can better handle complex tasks such as math, logic, or multi-step decision-making.
This approach often improves accuracy and transparency because the model breaks the problem into smaller logical parts rather than responding immediately with a single answer.
Figure 3: Example showing how chain-of-thought prompting works.39
Retrieval augmented generation (RAG) can also be used with commercials models if the enterprise is content with the data security policies of the foundation model provider.
Fine-tuning is also available to further improve model performance of commercial models offered via APIs.40
Pre-foundation model steps for enterprises
Building your enterprise model can take months since the steps below need to be completed. Each of these steps can take weeks to months, and they can not be fully parallelized:
- Data collection can take weeks to months. AI data collection services can accelerate this process by helping companies generate balanced, high-quality instruction datasets and other data for building or fine-tuning models. You can also work with data crowdsourcing platforms for more diverse datasets.
- Hiring data scientists with LLM expertise or hiring consultants can take weeks to months.
- Training and deployment
- Integrating models to business processes and systems
We recommend business leaders encourage experimentation with GenAI. It requires a paradigm shift: We must view machines not as senseless robots but as co-creators. Organizations should start using GenAI to foster this mindset shift, educating employees about its potential and empowering them to change how they work. As consultants often say, the key to any transformation, including AI transformation, is people.
Figure 4: BCG’s framework for human side of enterprise GenAI adoption41
Teams can leverage existing APIs to automate processes in domains where value of confidential data is lower and system integration is easier. Example domains where teams can leverage GenAI to improve productivity and increase teams’ familiarity with generative AI without building own models:
- New content creation and optimizing generated content for marketing campaigns
- Code generation for front-end software
- Conversational AI for customer engagement and support
Sustainability & costs
Generative AI requires significant computing resources, and therefore has both financial and environmental costs. Enterprises should evaluate these trade-offs carefully when deciding whether to build or optimize models.
Key considerations include:
- Lifecycle modeling: Research shows that the carbon footprint of LLMs spans training, inference, and even the hardware itself. Tools such as LLMCarbon provide frameworks to estimate these costs end-to-end.42
- Cloud sustainability controls: Cloud providers (e.g., Google, Microsoft, AWS) now publish data on the carbon intensity of their data centers.43
- Choosing greener regions or low-PUE (power usage effectiveness) facilities can significantly lower emissions.44
- Industry reporting: Independent reports (e.g., Stanford AI Index, MIT Tech Review) highlight that data center emissions are rising, even as efficiency improves.45 This underscores the need to right-size models and optimize inference rather than always chasing the largest model available.46
Practical cost-reduction tactics
Enterprises are adopting methods such as:
- Using smaller, specialized models (fine-tuned on internal data) rather than training from scratch.
- Applying efficiency techniques like quantization (compressing models) or request caching.
- Leveraging RAG so models only generate when needed, instead of retraining with every new dataset.
- Tracking not only financial cost but also CO₂ and water usage at the use-case level for transparency.
Recommendation: Business leaders should treat sustainability as both a cost control strategy and a compliance priority. By aligning AI deployment with corporate ESG goals, enterprises can reduce expenses and limit reputational risk.
What is the level of interest in enterprise generative AI?
Though there are many signs that show that enterprise generative AI is booming (e.g. generative AI related revenues of consultants), this has not been reflected in search engine queries yet. However, there is increasing interest in enterprise AI which was likely triggered by the launch of ChatGPT:
Adoption level
Since last year, major advisory houses have updated enterprise GenAI adoption roadmaps to emphasize operating-model change, governance, and value capture over tooling alone:
- 78% of organizations report using AI in at least one function; firms are rewiring workflows, appointing AI governance leads, and formalizing model-risk processes.47
- GenAI moving past the “peak hype,” with roadmap guidance shifting toward governed, productized use cases and platform thinking.48
AI’s productization gap
While model performance improves every few weeks, enterprise products often lag. Many solutions simply add AI into existing workflows (e.g., chat widgets, form fillers) instead of creating AI-first experiences designed from the ground up.
The real opportunity lies in rethinking products so AI becomes the core interaction model, not an add-on.49
FAQ
Generative AI includes text, image and audio output of artificial intelligence models which are also called large language models LLMs, language models, foundation models or generative AI models.
McKinsey’s Lilli AI leverages McKinsey’s proprietary data to answer consultant’s questions and cites its sources. McKinsey followed an LLM-agnostic approach and leverages multiple LLMs from Cohere and OpenAI in Lilli.
Walmart developed My Assistant generative AI assistant for its 50,000 non-store employees.
If you have other questions or need help in finding vendors, we can help:
Find the Right VendorsReference Links
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
Be the first to comment
Your email address will not be published. All fields are required.