Implementation and Management Approach
The Partners
The EuroMatrixPlus consortium integrates the efforts from academic research and companies to advance machine translation performance and bring it to the end user. The complex problem of translation requires an interdisciplinary research strategy. Neither linguists nor computer scientists, translation experts or mathematicians will be able to solve the problem without cooperation across traditional boundaries between disciplines. The partners of this consortium are selected on the basis of their complementary strengths, combining core competencies in machine translation and machine learning with experience in practical deployment in the marketplace.
The EuroMatrixPlus workplan consists of the following 10 work packages:
WP1: Rich Tree-Based Statistical Translation
Translating between European languages poses challenges - such as morphology and reordering - that are not adequately reflected in traditional phrase-based translation models. We therefore explore statistical translation models that exploit richer linguistic representations.
WP2: Hybrid Machine Translation
Recent detailed comparisons of rule-based and statistical translation systems carried out by members of the consortium have revealed different strengths of the two approaches that currently dominate the commercial and academic research field of machine translation. In this work package, we explore ways to tightly integrate the two approaches in a hybrid machine translation system.
WP3: Advanced Learning Methods for Machine Translation
Because statistical machine translation models are built in a data-driven fashion, the more training data that is used, the better the performance will be. Adding hundreds of million words leads to increasingly good translation quality. However, for many Central and Eastern European languages, limited training data constrains the quality of statistical machine translation systems. We will explore methods of using alternative training data and exploit better the available parallel corpora.
WP4: Open Source Tools and Data
We are committed to the idea of open source software as an essential means to collaborate within the EuroMatrixPlus project and to engage the greater research and development community. The consortium members have made significant contribution to the open source toolset in machine translation as part of the EuroMatrix project.
WP5: "WikiTrans" Community-Based Translation Environments
The ultimate test for machine translation is its utility for end-users. MT technology could be useful if it allows users to more quickly create content in their language from text in a source language that they have limited or no understanding of. This is especially important for many European languages that are currently under-served, both in terms in available content and in terms of existing language technology. In this work package, we bring the "Wiki" idea of collaborative content development to translation.
WP6: Integrated Localisation Workflow
The localization industry has not widely used machine translation, but has utilized translation memories to successfully in reduce the translation workload, especially in repetitive tasks such the translation of content that only partially changes over time (product manuals, company websites).
In partnering with the Research Centre for Next Generation Localisation (CNGL), we will integrate EuroMatrixPlus resources with CNGL research on standards and interoperability in localisation workflows. We will combine the technological advances in machine translation which are developed by other work packages with the industrial workflow processes used by the localisation industry. The close collaboration with industrial partners outside of EuroMatrixPlus will widen the reach of the results of this project and directly benefit the localisation industry in Europe.
WP7: Evaluation Campaign
Much of the progress in machine translation in this decade has been driven by open evaluation campaigns, where developers of machine translation systems are tasked to translate a previously unseen test corpus with their system and have their translation performance evaluated against other participants. The competitive aspect of these campaigns has driven researchers to focus on the most important problems for translation performance. The collaborative aspect of the follow-up meetings where methods are discussed in detail have contributed to the quick adoption of best known methods and the validation of novel approaches.
Almost all members of the EuroMatrixPlus consortium have participated in and helped to organize evaluation campaigns, most notably the series of workshops organised alongside ACL, the premier conference in computational linguistics. We will continue our efforts to provide a forum dedicated to the translation of European languages.
WP8: Project Management and Dissemination
WP9: Integrating Slovak Language Resources
The main goal of this work package is to include the Slovak language resources into the project.
WP10: HPSG-based Statistical Translation
The focus of this work package is the development of a statistical model for translation between Bulgarian and English. This will be done on the basis of a parallel HPSG-based treebank.