10 best practices for optimizing generative and agentic AI costs

June 14, 2026

65

As enterprises scale initiatives, the cost of developing, deploying and operating generative artificial intelligence models rises significantly. The shift toward AI agents can further increase costs becausse of poor architecture, limited operational maturity and weak governance.

Information technology leaders can adopt these 10 best practices for optimizing costs, enabling them to achieve quicker business value and operational efficiency:

1. Be objective about model accuracy, performance and cost tradeoffs

For IT leaders, selecting the right model requires balancing accuracy, performance and cost. IT leaders must be objective on the tradeoffs among accuracy, performance and costs. A tailored approach can deliver better performance and lower inference costs.

Additionally, most application programming interface providers charge for input and output tokens separately, while some charge based on the number of characters. Normalizing these pricing models for a given use case enables an apples-to-apples comparison.

Lastly, IT leaders should run extended pilots to vet their total cost of ownership assumptions and uncover any surprises or hidden costs.

2. Create an AI model sandbox to promote safety, model choice and price transparency

An excellent way for IT leaders to enable safe experimentation is to create an AI sandbox, which features available models in a self-service manner as part of a model catalog, underpinned by basic security and privacy principles.

Besides creating an AI sandbox, IT leaders should create model cards for the models available in the sandbox, so that users have better visibility into how and where to use them. They should also ensure that the model costs are transparent to the users via reporting tools, which enables them to make better economical choices without jeopardizing their accuracy or performance.

3. Balance upfront and operational costs in model augmentation and customization

When customizing gen AI models, IT leaders must balance upfront investments, such as prompt engineering, retrieval-augmented generation and fine-tuning, with ongoing inference costs. Running costs can be optimized by effective context engineering or even by efficiently fine-tuning a model on a specific dataset through instruction tuning or continued pretraining.

To balance costs, IT leaders should consider augmentation and customizations sequentially, only moving to a more advanced approach if a simpler one doesn’t meet the required output quality. To control gen AI costs, IT leaders can curate context inputs, ensuring each inference uses only the necessary information.

4. Understand the tradeoffs of self-hosting

Self-hosting gen AI models (often on-premises) can seem attractive for businesses seeking increased control and data privacy. IT leaders must be aware of the potential tradeoffs, as the list of cost drivers for self-hosting is extensive.

The most underestimated cost is the specialized talent required to operate gen AI at scale. IT leaders will have to consider the complexity and cost implications before opting to self-host gen AI models. They’ll also need to evaluate their organization’s capacity for upfront investment, ongoing maintenance and expertise needed.

5. Proactively manage software-as-a-service applications

SaaS vendors are packaging AI agents in inconsistent ways via bundled offerings, forced upgrades, optional tiers and add-ons. Each carries different cost, adoption and lock-in risks for organizations.

IT leaders will need to evaluate the real productivity impact of AI features, negotiate transparent cost attribution and avoid enterprise-wide upgrades without proven return on investment. In partnership, IT leaders should adopt a use-case-driven upgrade strategy by enabling AI only for roles or workflows where measurable gains justify the spend. At the same time, they should establish strict usage and access governance to prevent consumption sprawl and surprise costs. It will also help to demand transparent AI cost breakdowns from vendors.

6. Negotiate new pricing models for agentic AI

As AI agent pricing models continue to evolve to align more closely with IT leaders’ expectations on value delivery, IT leaders who anchor their investments in clear business value will be best-positioned to ensure long-term impact and sustainable returns.

IT leaders can support this by pushing SaaS vendors for flexible and predictable pricing models. They also can support run controlled AI agent pilots and track the cost per task, time saved and outcomes. From there, they can build internal benchmarks and agree on value-based pricing metrics before scaling.

7. Automate model selection, caching and routing

Cost differences between models make manual selection challenging for IT leaders, making automated model selection an ideal solution.

A new category of tools called AI gateways can help control costs by enforcing policies to track and manage access to AI services and by providing features such as caching and model routing to reduce costs.

IT leaders should create a systematic decision process for selecting different large language models for different tasks to reduce costs while achieving the required performance. This first step toward automation in itself can lead to large cost savings. Additionally, they should use AI gateways as a cost optimization and governance control plane that shapes how all AI usage occurs across the enterprise.

8. Build a shared RAG platform to prevent duplication

A shared RAG platform is essential since it can prevent every team from building its own ingestion, chunking and embedding pipelines, which can lead to massive data and infrastructure duplication, and other issues.

IT leaders should stand up a unified ingestion and embedding service, deploy a governed shared vector store and expose standardized APIs that teams can use for all gen AI applications and agents. They should also enforce policies that prevent team-level RAG sprawl and continuously monitor retrieval quality and cost metrics to optimize over time.

9. Educate users on cost-effective use of gen AI

Users must understand how to use gen AI efficiently to avoid waste and unnecessary costs. With so many different choices of applications, models and platforms to choose from, users should be educated on cost management best practices.

IT leaders should organize workshops where employees can experiment with LLMs and AI agents and analyze successful and unsuccessful prompts to illustrate best practices and common pitfalls.

10. Analyze visible and hidden costs on an ongoing basis

Gen AI platform investments have a number of visible and hidden costs that need to be considered upfront to make an informed decision, including data costs, talent costs and application setup and integration costs.

IT leaders will need to evaluate these cost factors and consider them in their total-cost-of-ownership assessment, keeping it in mind from the start. Additionally, IT leaders must focus on mitigating the key cost drivers. These are the variable costs that make a huge difference to the TCO.

As organizations move from pilots to production, costs can escalate quickly. By implementing these 10 best practices, IT leaders can maximize the return on their gen AI investment and unlock its full potential.

Arun Chandrasekaran is a distinguished VP analyst at Gartner, within the global CIO practice, where his research focus is on artificial intelligence. He wrote this article for SiliconANGLE. Chandrasekaran and other Gartner analysts will present how CIOs and IT executives can become agents of change in their organizations and harness AI for digital transformation at theGartner IT Symposium/Xpo, in Orlando, Florida Oct. 19-22.

Image: SiliconANGLE/Gemini

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

10 best practices for optimizing generative and agentic AI costs

1. Be objective about model accuracy, performance and cost tradeoffs

2. Create an AI model sandbox to promote safety, model choice and price transparency

3. Balance upfront and operational costs in model augmentation and customization

4. Understand the tradeoffs of self-hosting

5. Proactively manage software-as-a-service applications

6. Negotiate new pricing models for agentic AI

7. Automate model selection, caching and routing

8. Build a shared RAG platform to prevent duplication

9. Educate users on cost-effective use of gen AI

10. Analyze visible and hidden costs on an ongoing basis

Image: SiliconANGLE/Gemini

Must Read

Web data scraping infrastructure startup Oxylabs reels in $130M in its first funding round

Canva targets enterprise creativity with trusted AI creative workflows

Memory chip giant SK hynix bags $26.5B in blockbuster US listing

Autonomous ship startup Kraken raises $175M at $1B valuation

10 years later, ‘Pokémon Go’ is as popular as ever

(305) 677-3654

editor@miamibusinessmagazine.com

903 West 54th, Miami, FL 33127

Latest articles

Web data scraping infrastructure startup Oxylabs reels in $130M in its first funding round

Canva targets enterprise creativity with trusted AI creative workflows

Memory chip giant SK hynix bags $26.5B in blockbuster US listing

Popular Categories

10 best practices for optimizing generative and agentic AI costs

1. Be objective about model accuracy, performance and cost tradeoffs

2. Create an AI model sandbox to promote safety, model choice and price transparency

3. Balance upfront and operational costs in model augmentation and customization

4. Understand the tradeoffs of self-hosting

5. Proactively manage software-as-a-service applications

6. Negotiate new pricing models for agentic AI

7. Automate model selection, caching and routing

8. Build a shared RAG platform to prevent duplication

9. Educate users on cost-effective use of gen AI

10. Analyze visible and hidden costs on an ongoing basis

Image: SiliconANGLE/Gemini

RELATED ARTICLES

Must Read

(305) 677-3654

editor@miamibusinessmagazine.com

903 West 54th, Miami, FL 33127

Latest articles

Popular Categories