Google’s Newest AI Model, Palm 2, Trained on 5 Times More Data Than Predecessor

Google recently introduced its latest large language model, PaLM 2, which represents a significant leap in training data compared to its predecessor from 2022. According to internal documentation, PaLM 2 has been trained on an astounding 3.6 trillion tokens, the foundational elements used to train language models.

This substantial training data enables PaLM 2 to excel in advanced coding, mathematics, and creative writing tasks.

The previous version of Google’s language model, known as PaLM, was released in 2022 and trained on 780 billion tokens. Although Google and OpenAI, the creator of ChatGPT, have chosen not to disclose specific details about their training data, they attribute this decision to the highly competitive nature of the industry.

Nevertheless, there is a growing demand for transparency within the research community as the race for AI dominance intensifies.

In terms of efficiency, PaLM 2 represents a significant milestone, as it is reportedly smaller than previous language models while still being able to perform complex tasks. Internal documents reveal that PaLM 2 has been trained on 340 billion parameters, indicating the model’s sophistication.

This is a reduction compared to the initial PaLM, which was trained on 540 billion parameters.

Google has mentioned a “new technique” called “compute-optimal scaling” in relation to PaLM 2. This technique enhances the overall performance of the language model, resulting in faster inference, fewer parameters to handle, and lower serving costs. PaLM 2 is designed to support 100 languages and can perform a wide range of tasks.

Currently, it powers 25 features and products, including the experimental chatbot Bard. PaLM 2 is available in four sizes: Gecko, Otter, Bison, and Unicorn, ranging from smallest to largest.

Compared to other publicly disclosed language models, PaLM 2 appears to be more powerful. For instance, Facebook’s LLaMA language model, announced in February, is trained on 1.4 trillion tokens.

OpenAI released GPT-4 in March, claiming to exhibit “human-level performance” on various professional tests.

As AI applications become increasingly mainstream, controversies surrounding the underlying technology continue to escalate.

The lack of transparency from companies like Google and OpenAI has sparked concerns within the research community. In fact, a Google Research scientist, El Mahdi El Mhamdi, resigned in February due to the company’s lack of transparency.

Acknowledging the need for a new framework, OpenAI CEO Sam Altman recently testified before a Senate Judiciary subcommittee, stressing the responsibility that companies bear for the ethical deployment of AI tools.

He also advocated for a federal agency to regulate AI advancements.