AT&T recently optimized its Ask AT&T personal assistant by revising its orchestration layer and transitioning from large language models (LLMs) to small language models (SLMs). This strategic move has substantially decreased operational costs and improved processing efficiency. By leveraging smaller language models, AT&T aims to align its technology with evolving industry trends where efficiency and cost-effectiveness are critical priorities. The decision reflects an understanding of the diverse capabilities and applicability of SLMs in specific domains where the extensive power of LLMs is not always necessary.
In recent observations of AI deployment within enterprises, a noticeable shift has been seen toward adopting smaller models for operational tasks. Historically, larger models have required substantial resources, making them less feasible for extensive deployment without incurring significant costs. Companies now recognize that smaller models can fulfill targeted roles efficiently, offering significant cost benefits and deploying these models more widely without heavy investment in infrastructure. Consequently, smaller language models are being leveraged for their specialized capabilities, marking a significant change in enterprise AI strategy.
Why Switch to Smaller Models?
The transition from LLMs to SLMs by AT&T has resulted in a 90% reduction in costs, enabled the system to handle three times as many tokens, and enhanced the overall operational efficiency of the AI system. AT&T’s Chief Data Officer, Andy Markus, emphasized the accuracy of SLMs by stating,
“I believe the future of agentic AI is many, many, many small language models.”
This suggests that smaller models can serve as a viable and cost-effective alternative without compromising output quality. In utilizing SLMs, companies can more efficiently navigate the challenges of scaling AI within their operational frameworks.
Do SLMs Meet Performance Expectations?
SLMs offer substantial computational efficiency and allow for more control over AI applications, which proves advantageous for companies aiming to integrate AI without escalating costs. Markus also reflected on their performance, stating:
“We find small language models to be just about as accurate, if not as accurate, as a large language model on a given domain area.”
While LLMs excel in general knowledge domains, SLMs can outperform in industry-specific tasks due to their targeted training.
The practical application of SLMs is particularly evident in systems requiring AI to perform complex assignments involving multiple steps. In such cases, the operational load can be efficiently managed by SLMs, reserving LLMs for critical decision points that demand heightened computational power.
Research from companies like Nvidia (NASDAQ:NVDA) suggests that SLMs are more practical and profitable for enterprise use, underscoring their ability to support essential tasks efficiently while avoiding the cost and infrastructure demands associated with LLMs. This aligns with the growing trend of building models that emphasize efficiency and cost-reduction strategies.
For businesses, the focus is increasingly on developing models that are not only smaller and faster but also capable of maintaining performance levels. This approach allows enterprises to lower total ownership costs at a time when financial barriers remain a prevalent concern for the deployment of generative AI technologies. Almost half of all enterprises cite cost as a significant barrier to AI adoption, making the accessibility offered by SLMs particularly relevant.
