Large Language Models (LLMs) may seem like they’ve suddenly dropped into our lives, riding on the shiny coat-tails of popular culture. But let’s rewind a bit. Remember when Google Search offered you helpful suggestions before you finished typing? Or how Siri and other voice assistants magically understood your requests? Or when you chatted with a surprisingly helpful bot on a website? Surprise, surprise! You’ve been rubbing shoulders with LLMs all along. With the release of ChatGPT 3.5, LLMs had their moment in the spotlight, becoming more accessible than ever and stirring up our inner sci-fi geek’s dystopian fears. But we’ve been living with these digital linguists of different strengths for over two decades! The following timeline doesn’t chart the ‘origin story’ of LLMs but offers a glimpse into their recent evolution.
The “starting from scratch” route is to build a model out of thin air (metaphorically speaking). This would be called a Foundational Model in the world of large language models. All the models mentioned above are foundational models. Building a foundational model is a challenging task. Building a foundational model would broadly involve: defining a problem space, model architecture selection, data collection and preparation, training, evaluation, deployment and performance monitoring, and fine-tuning. These generally amount to millions of dollars. The below table shows the money spent to build some of the foundational models mentioned above:
Model | Estimated Cost |
---|---|
ChatGPT 3.5 | $5M |
Vicuna | $3.6M |
Llama | $4M |
ChatGPT 4 | $6.4M |
As the table above shows, building a foundational model is a complex and expensive undertaking. The cost of training and deploying such a model is prohibitive for many organisations, even those with deep pockets. As a result, most enterprises will likely opt to use foundational models that have been pre-trained by others. This can be done by embedding their documents into the foundational model or fine-tuning a foundational model.
Embedding is a way to represent text or code as numbers. This makes it easier for computers to understand the meaning of the text or code. In layman’s term, embedding is like giving a computer a dictionary. The dictionary will help the computer understand the meaning of words. Still, it won’t be able to understand the meaning of sentences or paragraphs.
Fine-tuning is a way to improve the performance of a pre-trained LLM on a specific task. This is done by feeding the LLM a dataset of data for the specific task. It is like giving a computer a book. The computer shall internalise the book and shall be able to answer all questions from it.
Whether to use embedding or fine-tuning depends on the specific task at hand. If the task is relatively simple, then embedding may be sufficient. However, fine-tuning may be necessary to achieve optimal performance if the task is more complex.
Feature | Embedding | Fine-tuning |
---|---|---|
Cost | Lower | Higher |
Time | Less | More |
Performance | Can be good enough for simple tasks | Can produce better results for complex tasks |
Suitability | For simple tasks | For complex tasks |
Enterprises are highly sensitive to misinformation. The ability of large language models (LLMs) to generate text that is factually incorrect is a major obstacle to their widespread adoption. This is especially true for enterprises that operate in regulated industries, where the accuracy of information is critical. The field is still in its early stages. Most enterprises are still exploring the potential of LLMs and how they can be integrated into their daily workflows. As the technology matures, the focus will shift to addressing the challenges of cost, complexity, bias, and data privacy. However, the adoption of LLMs is inevitable, given their potential to save costs and improve efficiency. The benefits of LLMs are too lucrative to ignore. LLMs can be used to automate a wide range of tasks, from customer service to product development. They can also be used to generate creative content, such as marketing copy and product descriptions. The cost savings and productivity gains that can be achieved through the use of LLMs are significant. The challenges of LLM adoption are surmountable. The challenges of cost, complexity, bias, and data privacy can be addressed through careful planning and implementation. Enterprises that are willing to invest in the development of LLM capabilities will be well-positioned to reap the rewards of this technology.
In the ever-evolving world of Machine Learning, claiming absolute certainty about future trends is an exercise in futility. However, informed speculation based on current trends and foresight can provide us with compelling possibilities. With that caveat, here are my predictions for Large Language Models (LLMs), though, as with any forecast, these should be taken with a grain of salt:
Widespread Adoption: The future will likely see LLMs becoming increasingly ubiquitous, operating on various devices, from powerful servers to handheld smartphones. For instance, Google’s PaLM2 LLM, with its four sub-models—Gecko, Otter, Bison, and Unicorn—is a testament to this trend. The smallest of the lot, the Gecko model is specifically designed to operate on portable and handheld devices, functioning on-device and even offline, thus making LLMs more accessible than ever.
Enterprise Adoption: Adopting LLMs in enterprises may follow a slower pace. The primary reason for this is enterprises’ low tolerance for hallucination or uncontrolled generation, an inherent risk with unsupervised LLMs. Instead of investing heavily in building foundational models, many enterprises will likely focus on fine-tuning existing open-source or proprietary models to work within a controlled setup specific to their use cases. This shall increase a significant demand for professionals who can effectively fine-tune and implement these models in enterprise settings.
Impact on Jobs: LLMs will likely simplify many jobs before completely replacing those. An example is the evolving role of copywriters who use LLMs to aid their work. On the bright side, LLMs could significantly enhance productivity. However, in the long run, over-reliance on LLMs may lead to the dissemination of misinformation, misguidance, and the development of flawed products, which could, in turn, affect their acceptance. Ultimately, the rise of LLMs could increase the value of highly-skilled individuals while making it challenging for those with mediocre or below-par skill levels to find employment.
As we gaze into our crystal ball, these are the trends we discern. The journey of LLMs will surely be interesting, and only time will tell how these predictions pan out.