The project Ob-Ugric Database: analysed text corpora and dictionaries for less described Ob-Ugric dialects is the proceeding project of OUL. It has been launched on July 1st 2014 and runs until June 30th 2017 and is funded by DFG (German Science Fund) and FWF (Austrian Science Fund).
The objective of this project is to continue and expand the work of OUL by systematizing, digitalizing, analysing and providing online data on two more Ob-Ugric dialects: Western Mansi (in the Pelym, Northern Vagilsk and Middle Lozva varieties) and Yugan Khanty. While the Western Mansi dialect group is already extinct and only text materials from the 19th century is available, Yugan Khanty is endangered, yet not described as a separate subdialect of (Eastern) Surgut Khanty. In addition to recent fieldwork materials, the Yugan Khanty team also deals with texts from Heikki Paasonen’s collection from the beginning of the 20th century.
Within the three project years, two new database modules for the two less described dialects will be created and implemented in the already established Virtual Research Environment. According to the OUL parameters, description and analysis will include (a) a phonological analysis in IPA form (instead of traditional idiosyncratic transcription systems), (b) a definition of morphological categories and their allomorphy in the given dialect as well as paradigms and position slot models for parts of speech compiled on the basis of allomorphic analysis. Results will be (c) grammatical descriptions that were absent up to now; (d) grammatically analysed and (e) into broadly used metalanguages (English, partly German) translated texts.
The additional ethnographic information will be incorporated in the thesaurus section. If the fieldwork on the Yugan rivers can be realized as planned, there will be extended ethnographical commentaries on the lexicon entries. Furthermore, the amount of the photo and video material will be increased.
The systematization of phonological and morphological categories of the dialects in question will extend and adapt description and result in a more exact account of grammatical categories of each Ob-Ugric dialect; the system of glossing principles and symbols (abbreviations) will be revized and extended. Also the number of dialectal dictionaries/concordances on the basis of the text corpus will be expanded, taking into consideration all available lexicographic sources and interviews with informants.
The user interface of the existing database can be provided with many more options for filtering the lexicon data and for starting concordance queries in the glossed text data. At the moment, only rudimentary queries for certain strings are possible, filtered for dialects. The value of the database will increase enormously with the possibility of filtering for parts of speech, morpheme types, complex form types, allomorphs, dialectal variants and writing variants. Such a filtering system is the basis for statistical returns on the corpus, a precondition for corpus linguistics on Ob-Ugric languages.
For Yugan Khanty, the only spoken dialect among the two dealt with in this project, additional texts shall be recorded during fieldwork, ideally both as audio and video, in any case as audio. With the agreement of the speakers as a precondition, these audio and video files will enrich the Yugan Khanty database by phonetic tagging with the program ELAN. In this way the phonetic characteristics of Yugan Khanty will be described more fully and morphonologic rules will be determined, the database will be expanded with audio and video annotations. Contrastive analysis of Yugan Khanty and Surgut Khanty proper will show whether the classification of the former as a separate subdialect is justified.
Additionally, the functional and pragmatic analysis will provide all-round linguistic information on the principles of text construction. The principles and set of categories for the syntactic, semantic and information structure (IS) analysis were developed in the Ob-BABEL project and tested on a few selected texts from the corpus. The desiderata in this respect is to implement a technical realization of this differentiated annotation system in our corpus. It will contain a (semi) automatic annotation of syntactic, semantic and pragmatic roles. This way the database can provide information on finer points of functioning of morphological categories in the text corpus.