About OUDB

The project Ob-Ugric Database: analysed text corpora and dictionaries for less described Ob-Ugric dialects is the proceeding project of OUL. It has been launched on July 1st 2014 and runs until June 30th 2017 and is funded by DFG (German Science Fund) and FWF (Austrian Science Fund).

The objective of this project is to continue and expand the work of OUL by systematizing, digitalizing, analysing and providing online data on two more Ob-Ugric dialects: Western Mansi (in the Pelym, Northern Vagilsk, Middle Lozva and Lower Lozva varieties) as well as Yugan Khanty. While the Western Mansi dialect group is already extinct and only text materials from the 19th century is available, Yugan Khanty is endangered, yet not described as a separate subdialect of (Eastern) Surgut Khanty. In addition to recent fieldwork materials, the Yugan Khanty team also deals with texts from Heikki Paasonen’s collection from the beginning of the 20th century.

Within the three project years, two new database modules for the two less described dialects will be created and implemented in the already established Virtual Research Environment. According to the OUL parameters, description and analysis will include (a) a phonological analysis in IPA form (instead of traditional idiosyncratic transcription systems), (b) a definition of morphological categories and their allomorphy in the given dialect . Results will be (c) grammatical descriptions (d) grammatically analysed and (e) into broadly used metalanguages (English, partly German) translated texts; some of them are also provided with (f) an annotation of functional, semantic and pragmatic roles.

The fieldwork on the Yugan rivers added visual material to our fieldwork archive.

At the same time, the existing data and results from OUL will be expanded and modified with new insights based on the new dialects:

The systematization of phonological and morphological categories of the dialects in question will extend and adapt description and result in a more exact account of grammatical categories of each Ob-Ugric dialect; the system of glossing principles and symbols (abbreviations) will be revized and extended. Also the number of dialectal dictionaries/concordances on the basis of the text corpus will be expanded, taking into consideration all available lexicographic sources and interviews with informants.

In addition there will be some new tasks regarding annotation/analysis and its display during this project:

The user interface of the existing database was provided with many more options for filtering the lexicon data and for starting concordance queries in the glossed text data. The value of the database has been increased enormously with the possibility of filtering for parts of speech, morpheme types, complex form types, allomorphs, dialectal variants and writing variants. Such a filtering system is the basis for statistical returns on the corpus, a precondition for corpus linguistics on Ob-Ugric languages.

For Yugan Khanty, the only spoken dialect among those dealt with in this project, additional texts were recorded during fieldwork. With the agreement of the speakers as a precondition, these audio enrich the Yugan Khanty database by phonetic tagging with the program ELAN. In this way the phonetic characteristics of Yugan Khanty are described more fully and morphonologic rules will be determined. Contrastive analysis of Yugan Khanty and Surgut Khanty proper will show whether the classification of the former as a separate subdialect is justified.

Additionally, the functional and pragmatic analysis will provide all-round linguistic information on the principles of text construction. The principles and set of categories for the syntactic, semantic and information structure (IS) analysis were developed in the OUL project and tested on a few selected texts from the corpus. The desiderata in this respect is to implement a technical realization of this differentiated annotation system in our corpus. The result is a (semi) automatic annotation of syntactic, semantic and pragmatic roles as well as a tagging of referents. This way the database can provide information on finer points of functioning of morphological categories in the text corpus.

An additional objective was the further expansion of the OUL portal as a common search platform for all types of accessible information on the languages and cultures in question. The anticipated results will be relevant not only for linguists interested in Uralic languages or typology, but also for other spheres of the humanities, esp. cultural anthropology. The materials of the database can be also used in the future for the purposes of language revitalization.

Furthermore, the language data of this project has been shared with The Language Archive (TLA) of the Max-Planck-Institute for Psycholinguistics (Nijmegen/The Netherlands). Our data is now a part of one of the most important collections of resources on languages worldwide, and will be automatically compatible with and integrated into the CLARIN D research infrastructure, making it more generally available and reusable.

“Ob-Ugric database: analysed text corpora and dictionaries for less described Ob-Ugric dialects”

International research project
1st July 2014 – 30th June 2017

Project leader: Prof. Dr. Elena Skribnik

Coordination:
Ludwig Maximilian University of Munich, Germany

Imprint - Privacy Disclaimer - Contact
Last update: 24-08-2023

Introduction

About OUL

ABOUT OUDB

Encyclopedic Section

Linguistic Section

Links

About OUDB