BLOOM: Bridging Language Barriers using a Prompting and Translation Approach

Flowers in bloom one pink one turquoise-violet

About the model

The BLOOM - BigScience Large Open-science Open-access Multilingual Language Model - series is another family of transformer-based language models. It was created by over 1000 AI researchers on a 1,000,000 A100-GPU-hours budget to provide a free large language model for everyone who wants to try.

Why try BLOOM next?

Trained on around 176 billion parameters over March through July 2022, BLOOM is considered an alternative to OpenAI’s GPT-3 equal in size and capability. Compared to GPT-J, which is an English centric model, BLOOM is multilingual. BLOOM was trained on the ROOTS corpus, which includes data in 46 natural languages and 13 programming languages. Languages were intentionally chosen, so that the research team had the knowledge to evaluate the data.

German Language Capabilities

Sadly German is not among those chosen languages. However some languages are present in the ROOTS Corpus unintentionally. An investigation of a 1% ROOTS sample shows that the model unintentionally saw some German during training (de: 0.21%) among some other “unseen” languages. However, these “unseen” languages only make up a very small proportion of the training dataset compared to languages that were supported intentionally like English, French and Spanish.

Languages in the ROOTS training data. Source: huggingface.co/bigscience/bloom#languages.

Languages in 1% ROOTS sample as labeled from meta-data (ROOTS-1%) and as re-identified using cld3 (ROOTSIDENTIFY-1%). Source: arxiv.org/abs/2211.01786v1, truncated at 0.1%.

There is research on adapting existing LLM to other languages, notably: CLP-Transfer and WECHSEL. Malte Ostendorff trained two monolingual German language models using the CLP-Transfer method based on BLOOM: bloom-6b4-clp-german and bloom-1b5-clp-german. However, these models have not been instruction finetuned to follow prompts.

Prompting capabilities

Muenninghoff et al. found Multitask Prompted Finetuning (MTF) to help large language models generalize to new tasks and languages in a zero-shot setting. Through multitask prompted finetuning BLOOM on the task mixture dataset xP3 they created a series of models called BLOOMZ.

Data in xP3mt, a machine translated version of xP3, contains prompts and expected outputs for a variety of different tasks in multiple languages. Finetuning on this datasets leads to a variety of language and reasoning skills for BLOOMZ. BLOOMZ models are capable of following human instructions in dozens of languages zero-shot.

Even though MTF helps LLMs generalize to unseen tasks & languages, we find that the German capabilities of BLOOMZ models are lacking. In English, however, we find that the BLOOMZ models are capable of generating title, teaser, keyword and summary snippets for a news article when provided with a simple prompt for the task.

How to find good Prompts

Leverage the models reasoning and question answering capabilities to understand what the model has learned about a certain domain, word or task. Example prompts:
- Imagine you are an expert, explain to me how to {task}:
  Example: Imagine you are an expert, explain to me how to write a teaser for a news article: …
- “What is {word/concept}?”
  Example: What is a teaser for a news article: …
Encorporate the knowledge gained from those tests into task prompts.
Test on a news article testset.

In this manner, we found the following prompts for title, teaser, summary and keyword generation:

What is the best title for this article?
Write a one or two sentence news hook/teaser/lede/bait:
Summarize in two to three sentences:
Keywords:

Translation to bridge the language gap for News Snippet Generation with BLOOMZ

With the trick of translation BLOOMZ can be used to generate news snippets in German.

Step 1: Translate German news article to English.
Step 2: Concatenate English prompt to English news article.
Step 3: Generate with the BLOOMZ model.
Step 4: Translate English model output to German

Schematic Sketch of the Generation Workflow with Translation

Note: Translation comes at the cost of a slight translation delay. Also translation may occasionally introduce translation errors. Below is an example.

Original Title	Generated Title English	Generated Title German
Mit MB.OS gegen Tesla: Daimler baut sein eigenes „Windows fürs Auto“ (Quelle: t3n)	MB.OS: Daimler builds Windows for cars	MB.OS: Daimler baut Fenster für Autos

A side note: Making LLMs truly accessible

There is one thing particularly fascinating when it comes to BLOOM: Not only is it a multilingual model that comes with a series of instruction fine-tuned models that can follow human instruction zero-shot. There is also PETALS to load these models collaboratively BitTorrent-style.

To explain why this could be interesting for news organizations in particular, here’s a short side note on model size and accessibility.

Scaling laws:

The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, Source.

Or in very simple words: It is expected that larger language models will perform better. In fact, the model size and compute requirements for state-of-the-art LLMs have been growing at a rate of 10x per year over the last few years. Which leads to another important point of consideration.

Accessibility:

Large models are far from easily accessible! With bigger model size more compute is necessary to actually train, host and use these models.

For some of those billion parameter models, checkpoints are publicly released under open source licenses. So in theory, everyone can download these models. But what would it take to actually be able to run inference aka generate some text with those large models?

Even though GPT-3 is not publicly available for download, let’s take it as an example for popularity reasons and also because it is of similar size to BLOOM: According to OpenAI’s whitepaper, GPT-3 uses half-precision floating-point variables at 16 bits per parameter. This means loading the model would require at least 350 GB of VRAM.

Similar rough estimate for BLOOM: 176B parameters in half precision (16 bits = 2 bytes) equals 352 GB RAM. But since some modules are 32-bit, it would actually take somewhat more. Inference also requires VRAM, so let’s add another ~25% overhead. This puts us at roughly 440 GB of VRAM needed to generate something with these models.

The current state-of-the-art for generally-available GPUs is nVIDIAs A100 80GB. Assuming we got access to A100 GPUs with 40 GB VRAM, which are much easier to find in public clouds: One would need at least 11 of these GPUs to load and use the model.

For BLOOM, this is where Petals comes in:

Petals provides a system for inference and fine-tuning of large models collaboratively by joining the resources of multiple parties.

Let’s see what this means for making open source large language models accessible for everyday editorial business. While GPT-3 is not an Open Source LLM and only available via API, BLOOM as an open source LLM can be hosted and fine-tuned collaboratively on joined hardware resources. In this manner LLMs become accessible at a speed where even interactive inference becomes possible.

BLOOM + Petals allow multiple individuals and organizations to come together and join their computing power. Together news organizations could host a private swarm among trusted entities. Doing so, would enable access to more powerful language models than any one entity could afford to run on their own.

Otherwise one can only resort to smaller models.

TL;DR: BLOOM is a Large Language Model as Open Source as they come, multilingual, and even with instruction tuned variants of the model. However none of the models from the BLOOM family has strong German capabilities. Translation can do the trick. But translation comes at the cost of slight delay and translation errors. Fascinating however, is BLOOM + Petals as it allows loading LLMs collaboratively to make them accessible.

“BLOOM: Bridging Language Barriers using a Prompting and Translation Approach” is part ten of our series News Snippet Generation. A Learning Journey on Open Source Large Language Models and how to assist journalists in generating headlines and teasers in German with AI.

Appendix

Get your hands on the models:
BLOOM: https://huggingface.co/bigscience/bloom
BLOOMZ: https://huggingface.co/bigscience/bloomz

Try German news snippet generation with BLOOMZ + the Translation approach at snipaid.tech.

Sources and further reading:

What Language Model to Train if You Have One Million GPU Hours?
https://arxiv.org/abs/2210.15424

The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
https://arxiv.org/abs/2303.03915

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
https://arxiv.org/abs/2211.05100

Crosslingual Generalization through Multitask Finetuning
https://arxiv.org/abs/2211.01786

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
https://arxiv.org/abs/2212.09535

Petals: Collaborative Inference and Fine-tuning of Large Models
https://arxiv.org/abs/2209.01188

GPT-J-6B: Exploring Approaches for News Snippet Generation and Evaluation

Alpaca and IGEL: Supporting German Language through Multitask Prompted Fine Tuning