Why small language models may be the greener path for applied AI

The sustainability debate around AI still gravitates toward the same image: giant training runs, giant model sizes, and giant data centers drawing ever more power from already strained grids. That picture is real enough. The International Energy Agency says global electricity consumption from data centers is projected to double to around 945 TWh by 2030 in its base case, with growth running far faster than total electricity demand across other sectors. AI has moved well beyond the lab. It now sits inside the infrastructure question itself.

Yet the harder part of the sustainability argument now sits elsewhere. It sits in inference, repeated endlessly across millions of everyday tasks, many of which do not need a frontier model at all. A 2025 UNESCO and UCL report argued that practical changes, including the use of smaller and more task-specific models, could reduce energy demand by up to 90 percent in some settings without sacrificing useful performance. That shifts the conversation away from spectacle and toward fit.

A better match for ordinary workloads

Small language models, or SLMs, are becoming easier to justify because many business tasks are narrower than the market’s AI branding suggests. Summarizing internal documents, extracting structured fields, rewriting text, classifying tickets, or adding natural-language controls inside an existing application rarely requires the full weight of a giant general-purpose model. In those settings, smaller models can be a cleaner operational choice and, increasingly, a cleaner energy choice too. UNESCO’s report makes that point directly by recommending a move away from resource-heavy general-purpose systems, where more compact models will do.

That line of thinking is becoming more relevant as the economics of inference sharpen. Reuters reported this week that Nvidia now sees more than $1 trillion in AI chip revenue opportunity by 2027, with the company explicitly tying that outlook to growing demand for inference. Once the industry starts talking this openly about inference at scale, model efficiency stops looking like a niche concern. It becomes part of cost control, power planning, and product design.

Using devices that already exist

The strongest sustainability case for SLMs may be local deployment. Not every prompt needs a round trip to a memory- and processor-hungry cloud stack. Some can run on devices that users or companies already own, which changes both the cost structure and the infrastructure burden.

Google has been notably explicit about this direction. In March 2025, it introduced Gemma 3 1B for mobile and web, saying the model is only 529MB and small enough to download quickly, respond fast enough for production apps, and support a wide range of end-user devices. Google framed the advantages in practical terms: offline availability, no cloud bill for those features, lower latency, and privacy for data that should stay on the device. In May 2025, Google also expanded AI Edge support for small language models across Android, iOS, and the web, including multimodality, retrieval-augmented generation, and function calling.

Microsoft has taken a similar path with Phi Silica. Its developer documentation describes Phi Silica as an NPU-tuned local language model for Windows, capable of tasks such as summarisation, rewriting, chat, and table conversion directly on-device. Microsoft’s Ignite 2025 materials added that Phi Silica had moved to stable release with up to 40 percent faster performance for efficient text generation and summarisation. This does not mean every ageing laptop suddenly becomes a full AI workstation. In practice, some of these experiences are tied to newer Copilot+ hardware. Even so, the architecture direction is clear enough. More AI work can stay local when the workload is bounded, and the model is compact.

That opens a more practical path than the usual AI arms-race framing. A company can add useful language features without routing every interaction through a remote cluster. A mobile app can summarise or search in-app content locally. A field tool can continue to work when connectivity drops. A lightweight assistant can run on existing phones, laptops, kiosks, or embedded systems instead of depending on continuous cloud inference. The gain is not only lower energy per task. It is a more selective use of data center resources overall.

Why this may matter more in Asia

Asia may be one of the clearest proving grounds for this model. AI adoption across the region is accelerating, but infrastructure conditions are uneven. Electricity costs, cloud dependence, connectivity quality, device fragmentation, and procurement limits vary widely between markets. At the same time, the IEA expects data center electricity demand to keep rising sharply worldwide through 2030. In that environment, an AI strategy that assumes constant access to top-tier centralized compute will often be harder to scale commercially.

Smaller models fit more naturally into that reality. A multilingual assistant for frontline workers, an offline education tool, a compact enterprise copilot for internal knowledge tasks, or a mobile-first customer service layer can all become easier to deploy when the model can run nearer to the user and does not require a large remote system for every query. The sustainability angle and the access angle begin to overlap here. Efficient AI is often easier to distribute. Google’s edge strategy is part of the reason that argument now feels less theoretical than it did a year ago.

Where investors are placing bets

Recent funding signals suggest that investors see commercial value in efficiency, not only in scale.

Fastino is one example. TechCrunch reported in May 2025 that the startup raised $17.5 million in seed funding led by Khosla Ventures for a model architecture it describes as intentionally small and task-specific, trained on low-end gaming GPUs rather than massive clusters. That does not make Fastino the definitive winner in the category, but it does show investor appetite for AI companies built around a smaller-model premise.

Another useful indicator sits slightly lower in the stack. Reuters reported in February 2025 that EnCharge AI raised more than $100 million in Series B funding to commercialise inference chips aimed at making AI cheaper and more energy efficient. Efficient local or edge AI is not only a model story. It also depends on hardware designed for lower-cost inference outside the largest cloud footprints.

There is also a broader venture backdrop. Reuters reported in October 2025, citing PitchBook data, that AI startups raised $73.1 billion globally in the first quarter of 2025 alone, accounting for 57.9 percent of all venture capital funding in that period. Not all of that money will flow into the same strategy. Some will continue chasing frontier-scale labs. Some will move toward the companies trying to make inference cheaper, smaller, and easier to distribute.

The likely impact

The likely payoff is broader than emissions alone. Smaller models running locally or at the edge can reduce latency, cut cloud usage, keep more sensitive data on-device, and make AI features available in lower-connectivity environments. Those are product advantages first. They also align with a less wasteful compute model. Google has explicitly marketed local deployment in terms of lower latency, privacy, and no cloud cost for those features, while Microsoft has positioned Phi Silica as a practical route to efficient on-device text generation.

There are limits, of course. Efficiency does not guarantee lower total environmental impact if cheaper inference simply leads to much more usage. The rebound effect remains real. UNESCO and UCL do not present smaller models as a magic answer. Their argument is more grounded than that. Practical savings come from design choices, including when to use a smaller model, when to shorten outputs, and when a large model is genuinely warranted.

A more selective AI economy

The most useful lens may be architectural discipline. The sustainability future of AI will not be shaped only by hyperscaler announcements or fresh rounds of data center spending. It will also be shaped by quieter choices inside products, enterprise systems, and procurement roadmaps. Which workloads stay local? Which ones go to the cloud? Which models are matched to task value instead of marketing value?

Small language models are unlikely to replace frontier systems. That is not really the standard they should be judged against. Their stronger case is that they can make AI more selective, more affordable, and easier to deploy on the devices and environments people already have. At a time when AI’s environmental cost is under greater scrutiny, that may prove to be one of the market’s more consequential shifts.

TNGlobal INSIDER publishes contributions relevant to entrepreneurship and innovation. You may submit your own original or published contributions subject to editorial discretion.

Featured image: Omar:. Lopez-Rincon on Unsplash

Editor’s note on using AI in contributed content