Cross-Linguistic Data Formats: Using Standards in Digitalization to Contribute to the Creation and Curation of Language Data

Standardization and retro-standardization can help to make existing datasets in linguistics comparable and share them with a broader public. The "Cross-Linguistic Data Formats" initiative develops new standards for multilingual language data and applies them to linguistic datasets in order to increase their reusability and transparency.

The "Cross-Linguistic Data Formats" initiative (CLDF) was founded in 2014 and has since then been extended along different dimensions in various projects. The goal of the initiative is to provide standards for cross-linguistic data and to apply them to the multitude of digitally available language data in order to create a pool of research data for historical and typological language comparison, which can be analyzed with unified methods.

At the Chair of Multilingual Computational Linguistics, we plan to extend the CLDF initiative by concentrating on certain areas that have so far not yet been targeted by CLDF. Here, we target specifically the modeling of texts in various forms (example sentences in grammars, poems, bigger corpora) and plan to address additional linguistic constructs (morphology, lexicon, syntax). Additionally, we want to provide server structures that help colleagues to deploy their own data online in the CLLD framework in order to make their data available to larger circle of users.

Principal Investigator(s) at the University	Prof. Dr. Johann-Mattis List (Lehrstuhl für Multilinguale Computerlinguistik)
Project period	01.04.2023 - 31.03.2028
Website	https://cldf.clld.org
Funding notice	The CLDF initiative was originally funded by the Max Planck Society. Over the years, parts of the CLDF specification and their application were funded by other research projects. These include, among others, the European Research Council, as part of the project "Computer-Assisted Language Comparison", lead by Johann-Mattis List from 2017 to 2022. With List's move to Passau, additional funding will be provided via the Chair of Multilingual Computational Linguistics through the University of Passau.

Diese Webseite verwendet Cookies, um Ihnen einen nutzerfreundlichen Service zu bieten sowie Nutzerverhalten in pseudonymer Form zu analysieren. Weitere Informationen sowie die Möglichkeit zum Widerruf finden Sie in unserer Datenschutzerklärung.

Essentiell

Statistiken

Alle akzeptieren

Speichern

Playing the video will send your IP address to an external server.

Show video

Nach oben