A robot in a blooming meadow of flowers

To find models that are capable of generating German-language snippets from news articles, we took an exploratory approach. We started by testing different prompts on the paid API from OpenAI (Model: text-davinci-003), researched pre-trained LLMs and instruction-tuned LLMs and evaluated some models of the GPT-, T5- and BLOOM-Family.

Promising GPT-3 Prompts

Example: GPT-3 suggests the following headlines for this News Article on Google’s new management from 2015

Testing GPT-3 Prompts

We made our Notebook available to test News Headline Generation with GPT-3 on 10kGNAD with text-davinci-003. As input we supply the model with the article full text followed by a prompt like this: “{article text} {prompt}” or “{article} \n {prompt}”.

For further snippets the following prompts perform well with GPT-3:

  • Schreibe einen kurzen Teaser für diesen Artikel, der neugierig macht:
  • Schreibe eine kurze Zusammenfassung des Artikels in drei Stichpunkte:
  • Zusammenfassung, so kurz wie möglich, W-Fragen:
  • Drei verschiedene Snippets zu diesem Artikel, max 280 Zeichen:
  • Schreibe 3 verschiedene SERP-Snippets:
  • Schreibe ein SERP-Snippet für diesen Artikel mit dem Fokus Keyword “keyword”:
  • Schreibe verschiedene Überschriften zu dem Artikel (faktisch, clickbaity, emotional, humorvoll, Fachjargon):
  • Schreibe einen Tweet zu diesem Artikel:
  • Schreibe einen LinkedIn-Post zu diesem Artikel:

GPT-3 is a corporate model only accessible by API. In the spirit of the open source community, we ventured out to find an open source competitor for the task of German News Snippet Generation.

Overview of currently accessible pre-trained open source LLMs

There are several open source pre-trained LLM with different parameter sizes, licenses and language capabilities released in 2022. BLOOM and LLaMA have been trained on multilingual text corpora, other models are primarily trained on English text corpora.

Name Size License Links Training Corpus Language Release
GPT2 124M - 1.5B MIT Github, Huggingface, Paper WebText en 05/2019
GPT-J 6B MIT, Weights: Apache 2.0 GitHub, Huggingface The Pile en 09/2022
OPT 125M - 66B OPT-LICENSE Github, Huggingface, Paper BookCorpus, English Wikipedia, Commoncrawl (CC-News, Stories), OpenWebText, The Pile, PushShift.io Reddit primarily en 05/2022
RMKV-4 100M - 14B Apache 2.0 Github, Huggingface The Pile en 09/2022
GPT-NeoX 20B Apache 2.0 Github, Huggingface, Paper The Pile en 04/2022
BLOOM 560M - 176B RAIL Huggingface, Paper ROOTS multilingual 11/2022
LLaMA 7B - 65B GPL-3.0 Github, Paper, Paper CommonCrawl, C4, Github, Wikipedia, Books, ArXiv, StackExchange multilingual 09/2022
UL2 20B Apache 2.0 Huggingface, Paper C4 en 05/2022
GLM 130B Apache 2.0 Github, Paper The Pile, Wudao Corpora, various Chinese corpora and webcrawling data en, ch 10/2022

During our research we came across possibilities to adapt existing LLM to other languages, notably: CLP-Transfer and WECHSEL. Malte Ostendorff trained two monolingual German language models using the CLP-Transfer method based on BLOOM-7b1: bloom-6b4-clp-german and bloom-1b5-clp-german

For the 1.5B sized GPT2 model, there are several monolingual adaptations: german-gpt2, german-gpt2-larger, gpt2-wechsel-german, gpt2-wechsel-german-ds-meg, gerpt2-large

Overview of currently accessible instruction-tuned open source LLM

Since the instruction-tuning paradigm gained popularity within the development of language models, several new models of this type have been released open source. Particularly noteworthy are the multilingual variants (BLOOMZ, MT-0, MT-5), as they are applicable to our task.

Name Size License Links Language Release
BLOOMZ 560M - 176B RAIL Github, Huggingface, Paper multilingual 11/2022
FLAN-T5 80M - 11B Apache 2.0 Github, Huggingface, Paper en 10/2022
Galatica 125M - 120B Apache 2.0 Github, Huggingface, Paper en 11/2022
MT-0 300M -13B Apache 2.0 Github, Huggingface, Paper multilingual 10/2022
MT-5 300M - 13B Apache 2.0 Github, Huggingface, Paper multilingual 11/2020
OpenChatKit 20B Apache 2.0 Github ? 03/2023
Flan-UL2 20B ? Github ? 03/2023
Stanford Alpaca 7B Apache 2.0 Github en 03/2023

During our search we learned that new languages can be learned through multitask prompted finetuning. The following papers should be mentioned in this regard:

Testing Model Capabilities on the Task of “News Headline Generation” in German

We aimed to evaluate the performance of various language models on the task of “News Headline Generation” in the German language. To achieve this, we conducted exploratory tests on the 10kGNAD German news dataset.

Among the GPT family, we tested two models: vanilla GPT-J-6B (Notebook) and mGPT (Notebook). Despite iterative tuning of generation parameters, mGPT regularly produced repetitions and language gibberish such as special characters and URLs, and also had stopping problems. On the other hand, GPT-J-6B, despite being trained on a small amount of German language data, generated surprising and promising results. However, the model’s inference time was relatively long due to its size.

We also reviewed several models from the T5 family, such as MT5-small (Notebook), MT0-base (Notebook), flant5-base (Notebook), and a fine-tuned model for the task called “german-news-title-gen-mt5” (Notebook). As described in the fifth post of this series, the fine-tuned model performed remarkably well on the task. The vanilla MT5 model could not generate coherent sentences out-of-the-box as it was only pretrained on the span-mask filling task and not on any down-stream tasks, as noted in this GitHub Issue. MT-0 and FLAN-T5 had no German language capabilities.

Finally, we tested three models from the BLOOM model family: BLOOMZ (Notebook), bloom-1b5-clp-german (Notebook), and bloom-6b4-clp-german (Notebook). The smaller model had a good performance on the task of News Headline Generation, despite not being fine-tuned on any specific tasks. We did not observe any significant improvements in the larger model.

A note on German data seen in pre-training

Model Training Corpus German Language Share
GPT-J The Pile some de of unknown percentage, unintended, 97.4% en (Source)
mT5 mC4, xP3 ~ 5% de intentional (Source) during pre-training, none intentionally during instruction fine-tuning
BLOOMZ ROOTS, xP3 0.21% de unintended during pre-training (Source); none intentionally during instruction fine-tuning
LLaMA/Alpaca Pretraining on English CommonCrawl, C4, Github, Wikipedia, Gutenberg and Books3, ArXiv, Instruction Tuning on Alpaca Dataset some de of unknown percentage during Pretraining but mainly English, instruction tuning exclusively in English

Through the lack of German language data seen during pre-training most Open Source Language Models are deficient in their ability to handle German language.

Interim results, learnings and candidate selection

None of the tested models are able to solve the task of news snippet generation for German news without further adaptation. As a result, our focus has shifted towards selecting models that have the potential to be adapted to solve the task at hand. We selected four candidates for further consideration:

  • GPT-J as a competitor to GPT-3 Curie
  • Bloomz as a multilingual human instruction zero-shot solver
  • mT5 as a strong multilingual summarizer
  • Alpaca as newest competitor to ChatGPT

These models have the potential to be adapted to the task, but further work is needed to make them a viable solution. In the next few posts, we’ll explore how to adapt these models.

TL;DR: GPT-3 is capable of generating a variety of news snippets in German language by simply prompting the model. On the search for an Open Source competitor, the following became clear. There are competitors both pre-trained and some even instruction-tuned. However, none of them has the ability to deal with both German language and human prompts. To solve the task of news snippet generation for German news further adaptation will be necessary.

“The model safari: An Explorative model comparison” is part eight of our series News Snippet Generation. A Learning Journey on Open Source Large Language Models and how to assist journalists in generating headlines and teasers in German with AI.


<
Previous Post
German News Data: A Comprehensive Guide to Public Datasets and Considerations for Private Dataset Creation
>
Next Post
GPT-J-6B: Exploring Approaches for News Snippet Generation and Evaluation