Third International Workshop on Systems and Frameworks for Computational Morphology
SFCM 2013 on Twitter: #sfcm2013
SFCM 2013 > Program > Abstracts

Abstracts

  • The State of Computational Morphology for Europe's Languages and the META-NET Strategic Research Agenda
    (Georg Rehm)

    Recognising Europe’s exceptional demand and opportunities for multilingual language technologies, 60 leading research centres in 34 European countries joined forces in META-NET, a European Network of Excellence.  Working together with numerous additional organisations and experts from a variety of fields, META-NET has developed the Strategic Research Agenda for Multilingual Europe 2020 (SRA) [42] – the complex planning and discussion process took more than two years to complete and involved ca. 200 experts.  In this contribution we motivate the SRA, briefly describe the current state of Language Technology, especially Computational Morphology, in Europe and discuss the findings in the overall framework of the plans and strategies as specified in the META-NET Strategic Research Agenda.

  • HFST—a System for Creating NLP Tools
    (Krister Lindén, Erik Axelson, Senka Drobac, Sam Hardwick, Juha Kuokkala, Jyrki Niemi, Tommi Pirinen and Miikka Silfverberg)

    The paper presents and evaluates various NLP tools that have been created using the open source library HFST--Helsinki Finite-State Technology and outlines the minimal extensions that this has required to a pure finite-state system. In particular, the paper describes an implementation and application of p-match presented by Karttunen at SFCM 2011.

  • Verbal Morphosyntactic Disambiguation through Topological Field Recognition in German-Language Law Texts
    (Kyoko Sugisaki and Stefan Höfler)

    The morphosyntactic disambiguation of verbs is a crucial pre-processing step for parsers of morphologically rich languages like German and domains with complex clause structures like law texts. This paper explores how much linguistically motivated rules can contribute to the task. It introduces an incremental system of verbal morphosyntactic disambiguation that exploits the concept of topological fields. The system presented is capable of reducing the rate of POS-tagging mistakes from 10.2% to 1.6%. The evaluation shows that this reduction is mostly gained through checking the compatibility of morphosyntactic features within the long-distance syntactic relationships of discontinuous verbal elements. Furthermore, the present study shows that in law texts, the average distance between the heads and complements of clauses is relatively large (9.5 tokens), and that in this domain, a wide context window is therefore necessary for the morphosyntactic disambiguation of verbs.

  • A Case Study in Tagging Case in German: An assessment of statistical approaches
    (Simon Clematide)

    In this study, we assess the performance of purely statistical approaches using supervised machine learning for predicting case in German (nominative, accusative, dative, genitive, n/a).
    We experiment with two different treebanks containing morphological annotations: TIGER and TUEBA .
    An evaluation with 10-fold cross-validation serves as the basis for systematic comparisons of the optimal parametrization of different approaches. We test taggers based on Hidden Markov Models (HMMs), Decision Trees, and Conditional Random Fields (CRF). The CRF approach based on our own feature model outperforms all other approaches and results in an improvement of 11% compared to an HMM trigram tagger.
    Moreover, we investigate the effect of additional morphological features (part-of-speech, gender, number, person) in the internal tagset used for the training. Rich internal tagsets improve results for all tested approaches.

  • A Rule-based Morphosemantic Parser for French for a Fine-grained Semantic Annotation of Texts
    (Fiammetta Namer)

    We describe DériF, a rule-based morphosemantic parser developed for French. Unlike existing word segmentation tools, DériF provides derived and compound words with various sorts of semantic information: (1) a definition, computed from both the base meaning and the specificities of the morphological rule; (2) lexical-semantic features, inferred from general linguistic properties of derivation rules; (3) lexical relations (synonymy, (co-)hyponymy) with other, morphologically unrelated, words belonging to the same analyzed corpus.

  • Jabalín: a Comprehensive Computational Model of Modern Standard Arabic Verbal Morphology Based on Traditional Arabic Prosody
    (Alicia González Martínez, Susana López Hervás, Doaa Samy, Carlos G. Arques and Antonio Moreno Sandoval)

    The computational handling of Modern Standard Arabic is a challenge in the field of natural language processing due to its highly rich morphology. However, several authors have pointed out that the Arabic morphological system is in fact extremely regular. The existing Arabic morphological analyzers have exploited this regularity to variable extent, yet we believe there is still some scope for improvement. Taking inspiration in traditional Arabic prosody, we have designed and implemented a compact and simple morphological system which in our opinion takes further advantage of the regularities encountered in the Arabic morphological system. The output of the system is a large-scale lexicon of inflected forms that has subsequently been used to create an Online Interface for a morphological analyzer of Arabic verbs. The Jabalín Online Interface is available at http://elvira.lllf.uam.es/jabalin/, hosted at the LLI-UAM lab. The generation system is also available under a GNU GPL 3 license.

  • A System for Archivable Grammar Documentation
    (Michael Maxwell)

    This paper describes a number of criteria for archivable documentation of grammars of natural languages, extending the work of Bird and Simons' "Seven dimensions of portability for language documentation and description." We then describe a system for writing and testing morphological and phonological grammars of languages, a system which satisfies most of these criteria (where it does not, we discuss plans to extend the system).
    The core of this system is based on an XML schema which allows grammars to be written in a stable and linguistically-based formalism, a formalism which is independent of any particular parsing engine. This core system also includes a converter program, analogous to a programming language compiler, which translates grammars written in this format, plus a dictionary, into the programming language of a suitable parsing engine (currently the Stuttgart Finite State Tools). The paper describes some of the decisions which went into the design of the formalism; for example, the decision to aim for observational adequacy, rather than descriptive adequacy. We draw out the implications of this decision in several areas, particularly in the treatment of morphological reduplication.
    We have used this system to produce formal grammars of Bangla, Urdu, Pashto, and Persian (Farsi), and we have derived parsers from those formal grammars. In the future we expect to implement similar grammars of other languages, including Dhivehi, Swahili and Somali. In further work (briefly described in this paper), we have embedded formal grammars produced in this core system into traditional descriptive grammars of several of these languages. These descriptive grammars serve to document the formal grammars, and also provide automatically extractable test cases for the parser.

  • Implementing a formal model of inflectional morphology
    (Benoît Sagot and Géraldine Walther)

    Inflectional morphology is a matter of concern for typologists, formal linguists and computational linguists, although with diverse objectives and approaches. In this paper, we describe the implementation of a formal model of inflectional morphology that aims at capturing typological generalisations. We show that both studies in descriptive and formal morphology as well as NLP tool and resource development benefit from the availability of such a model and an implementation thereof.