Language models are powerful tools that can generate natural language for a variety of tasks, such as summarizing, translating, answering questions, and writing essays. But they are also expensive to train and run, especially for specialized domains that require high accuracy and low latency.
That’s where Apple’s latest AI research comes in. The iPhone maker has just published a major engineering breakthrough in AI, creating language models that deliver high-level performance on limited budgets. The team’s newest paper, “Specialized Language Models with Cheap Inference from Limited Domain Data,” presents a cost-efficient approach to AI development, offering a lifeline to businesses previously sidelined by the high costs of sophisticated AI technologies.
The new revelation, gaining rapid attention including a feature in Hugging Face’s Daily Papers, cuts through the financial uncertainty that often shrouds new AI projects. The researchers have pinpointed four cost arenas: the pre-training budget, the specialization budget, the inference budget, and the size of the in-domain training set. They argue that by navigating these expenses wisely, one can build AI models that are both affordable and effective.
Pioneering low-cost language processing
The dilemma, as the team describes it, is that “Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets.” Their work responds by offering two distinct pathways: hyper-networks and mixtures of experts for those with generous pre-training budgets, and smaller, selectively trained models for environments with tighter budgets.
The AI Impact Tour – NYC
We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.
Request an invite
In the research, the authors compared different approaches from the machine learning literature, such as hyper-networks, mixture of experts, importance sampling, and distillation, and evaluated them on three domains: biomedical, legal, and news.
They found that different methods perform better depending on the setting. For example, hyper-networks and mixture of experts have better perplexity for large pre-training budgets, while small models trained on importance sampled datasets are attractive for large specialization budgets.
The paper also provides practical guidelines for choosing the best method for a given domain and budget. The authors claim that their work can help “make language models more accessible and useful for a wider range of applications and users”.
Disrupting the industry with budget-conscious models
The paper is part of a growing body of research on how to make language models more efficient and adaptable. For instance, Hugging Face, a company that provides open-source tools and models for natural language processing, recently launched an initiative with Google that makes it easier for users to create and share specialized language models for various domains and languages.
While more evaluation on downstream tasks is needed, the research highlights the trade-offs businesses face between retraining large AI models versus adapting smaller, efficient ones. With the right techniques, both paths can lead to precise results. In short, the research concludes that the best language model is not the biggest, but the most fitting.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.