ERC research group "ProduSemy": Using algorithms to track the evolution of word families
From "ell", "bow" and "socius" to "elbow society", more commonly known as dog-eat-dog society: using computer-assisted models, a new ERC-funded research group at the University of Passau under the supervision of Professor Johann-Mattis List has set out to explore a topic that linguists know little about. The European Research Council (ERC) is supporting the project with two million euros.
While machines are becoming smarter by the day and cars are learning to talk, there are lots of deceptively simple matters that continue to baffle linguists: how do word families come about and why are some bigger than others? "In linguistics, we know surprisingly little about how new words are formed in our language and what makes people give new words the form they then take," says Professor Johann-Mattis List. He holds the Chair of Multilingual Computational Linguistics at the University of Passau and has set himself the goal of exploring the topic with the help of machines.
Linguists call words that share a common origin in one or across several languages "word families". In fact, these families are dynamic and continue to change as they are combined with one another. Professor List cites the German word “Ellenbogengesellschaft”, literally "elbow society" (dog-eat-dog society) as an example. It consists of the words “Elle” — "underarm", “Bogen” — "bow" and “Gesellschaft” — "society" and has taken on a new meaning that no longer has anything to do with the original words. As Professor List puts it, some parts of the word are more productive than others and give rise to considerably more new words than others.
Using computer modelling to understand word families
Why is this so? And how are word families formed across languages? Those are the questions Professor List has set out to study in the new ERC-funded research group "ProduSemy" he is currently establishing. The title is a reference to "productive signs", namely those words that form particularly large word families. The group will be using computer-assisted methods: They will be developing algorithms which researchers will then be able to use to distinguish word families in large language corpora and to systematise such data across languages. The corpora contain words from up to 1 000 languages.
"Why do words form families? Why are some word families bigger than others? And to what degree are word family structures different or similar across various languages? Those are questions that have intrigued me for a long time," says the linguist. "I am thrilled to be able to examine these questions in depth with the new research group.
For the project, Professor List has been able to secure one of the noted consolidator grants of the European Research Council (ERC), which are awarded in a highly competitive selection process involving several stages. The ERC is the premier European funding organisation for excellent frontier research. The research group at the University of Passau will be receiving funds totalling two million euros over a five-year period.
Linguist Professor Johann-Mattis List has held the Chair of Multilingual Computational Linguistics created as part of the “Hightech Agenda” innovation campaign launched by the Free State of Bavaria since January. Before that, he had served as stand-in professor at Bielefeld University and as senior researcher at the Max Planck Institute for Evolutionary Anthropology in Leipzig and the Max Planck Institute for the Science of Human History in Jena where he headed another ERC-funded research group on computer-assisted language comparison. He earned his doctorate at the Heinrich Heine University in Düsseldorf and wrote his habilitation at the Friedrich Schiller University in Jena.
Project description in the EU database CORDIS
Interview with Professor List about the ERC project (German)
Blog post by Professor List on "productive signs" (German)
The Chair's website
Press release on idw
|Principal Investigator(s) at the University||Prof. Dr. Johann-Mattis List (Lehrstuhl für Multilinguale Computerlinguistik)|
|Project period||01.01.2023 - 31.12.2027|
|Source of funding|
Europäische Union (EU) > EU - 9. Forschungsrahmenprogramm (Horizon Europe) > EU - Horizon Europe - ERC - Consolidator Grant
“Funded by the European Union (ERC, Produsemy, 101044282). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.”