Lade Veranstaltungen
Kolloquium

Virtual Lecture by Prof. Hinrich Schütze, „Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages“

06.07.2023, 16:00 - 17:30
Online via Zoom
« Zurück zur Übersicht

We announce a virtual talk in the JAII Lecture Series (JAII). Our presenter, Prof. Hinrich Schütze (LMU Munich, Homepage of Hinrich Schütze’s lab) is a renowned expert in computational linguistics who will talk about scaling of Large Language Models.

On July 6, Hinrich Schütze will talk about „Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages„ The talk will be virtual, this is the link to the lecture.

Abstract:
Large language models (LLMs) are currently the most active area of research in NLP. Most work has focused on what we call „vertical„ scaling: making LLMs even better for a relatively small number of high-resource languages. We address „horizontal„ scaling instead: extending LLMs to a large subset of the world’s languages, focusing on low-resource languages. Our Glot500-m model is trained on 500 languages, many of which are not covered by any other language model. I will talk about the major challenges we faced in creating Glot500: (i) finding, validating and cleaning training data for that many languages; (ii) evaluating performance of Glot500-m on languages for which native speakers and labeled datasets were not available to us; and (iii) determining the factors that ultimately make training on a language successful. We find that trying to reduce such factors to the so-called curse of multilinguality is naive and there is in fact also a „boon of multilinguality„. We are in the process of making Glot500-c, our training corpus covering 500 languages, publicly available.

Alles auf einen Blick:

  • Veranstalter: Cognitive Interaction Technology Excellence Cluster (CITEC)
  • Ort: Online via Zoom
  • Zeit: 06.07.2023, 16:00 - 17:30
  • Zielgruppe: Lehrende und Forschende, Studierende, Wiss. Nachwuchs
  • Öffentlichkeit: universitätsintern
  • zu Ihrem Kalender hinzufügen (iCAL/.ics)