go to home page | go to navigation | go to page content | go to contact | go to sitemap
Home > Cases > Service provision through TV for seniors > EuroMatrixPlus: Bringing Machine Translation for European Languages to the User
practice EuroMatrixPlus: Bringing Machine Translation for European Languages to the User

EuroMatrixPlus: Bringing Machine Translation for European Languages to the User

301 Visits
| 0 Comments |
starstarstarstarempty starIn order to vote, you need to be logged in!

Acronym of the case:

EuroMatrixPlus

Web address of the case:

Country of the case:

Bulgaria , Czech Republic , France , Germany , Ireland , Italy , United Kingdom , North America

Posting Date:

18 January 2012

Last Edited Date:

20 February 2012

Author:

Stephan Busemann (DFKI GmbH)
case's imagegmflash's picture
Editor's Choice 2012

Type of initiative

  • Project or service-imgProject or service

Case Abstract

Europe faces a growing economic and societal challenge due to its vast diversity of languages, and machine translation technology holds promise as a means to address this challenge. The goals of EuroMatrixPlus are:

  1. To continue the rapid advance of machine translation technology, creating example systems for every official EU language, and providing other machine translation developers with our infrastructure for building statistical translation models.
  2. To continue and broaden the controlled systematic investigation of different approaches and techniques to accelerate the scientific evolution of novel methods, including both selection and cross-fertilization. The aim is to arrive at scientifically well understood novel combinations of methods that are proven superior to the state of the art.
  3. To focus on bringing machine translation to the users, in addition to focusing on scientific advances. Because our statistical models are derived from example translations, we believe that there is potential for a synergistic relationship in which users suggest improvements to the system by post-editing its output, and the system improves itself by learning from user feedback.
  4. To contribute to the growth and competitiveness of the European MT research scene and infrastructure through its open international competitive shared tasks and living community supported surveys of resources, tools, systems and their respective capabilities.

In bringing MT to the users, EuroMatrixPlus focuses on two different types of users: (a) professional translators and translation agencies working for private corporations, administrations, and other organisations, and (b) lay users who create content on a volunteer basis by translating foreign materials into their own languages. The project will investigate how these users can benefit from state of the art machine translation, and conversely, how machine translation can benefit from user corrections.

EuroMatrixPlus will create an openly accessible sample application that enables users to automatically translate news stories and web pages from any European language into any other, and whose corrections will be exploited as data for improving translation technology.

Description of the case

Date
March 2009 to April 2012
Target Users
Business (self-employed) | Business (industry) | Business (SME) | Citizen
Target Users Description

Professional translators and translation agencies working for private corporations, administrations, and other organisations.

Lay users who create content on a volunteer basis by translating foreign materials into their own languages.

Scope
International
Language(s)
English

Policy Context and Legal Framework

Project Size and Implementation

Type of initiative
IT infrastructures and products
Overall Implementation approach
Partnerships between administration and/or private sector and/or non-profit sector
Technology choice
Open source software
Funding source
Public funding EU
Project size
Implementation: €5,000,000-10,000,000

Implementation and Management Approach

The Partners

The EuroMatrixPlus consortium integrates the efforts from academic research and companies to advance machine translation performance and bring it to the end user. The complex problem of translation requires an interdisciplinary research strategy. Neither linguists nor computer scientists, translation experts or mathematicians will be able to solve the problem without cooperation across traditional boundaries between disciplines. The partners of this consortium are selected on the basis of their complementary strengths, combining core competencies in machine translation and machine learning with experience in practical deployment in the marketplace.

 

The EuroMatrixPlus workplan consists of the following 10 work packages:

WP1: Rich Tree-Based Statistical Translation
Translating between European languages poses challenges - such as morphology and reordering - that are not adequately reflected in traditional phrase-based translation models. We therefore explore statistical translation models that exploit richer linguistic representations.

WP2: Hybrid Machine Translation
Recent detailed comparisons of rule-based and statistical translation systems carried out by members of the consortium have revealed different strengths of the two approaches that currently dominate the commercial and academic research field of machine translation. In this work package, we explore ways to tightly integrate the two approaches in a hybrid machine translation system.

WP3: Advanced Learning Methods for Machine Translation
Because statistical machine translation models are built in a data-driven fashion, the more training data that is used, the better the performance will be. Adding hundreds of million words leads to increasingly good translation quality. However, for many Central and Eastern European languages, limited training data constrains the quality of statistical machine translation systems. We will explore methods of using alternative training data and exploit better the available parallel corpora.

WP4: Open Source Tools and Data
We are committed to the idea of open source software as an essential means to collaborate within the EuroMatrixPlus project and to engage the greater research and development community. The consortium members have made significant contribution to the open source toolset in machine translation as part of the EuroMatrix project.

WP5: "WikiTrans" Community-Based Translation Environments
The ultimate test for machine translation is its utility for end-users. MT technology could be useful if it allows users to more quickly create content in their language from text in a source language that they have limited or no understanding of. This is especially important for many European languages that are currently under-served, both in terms in available content and in terms of existing language technology. In this work package, we bring the "Wiki" idea of collaborative content development to translation.

WP6: Integrated Localisation Workflow
The localization industry has not widely used machine translation, but has utilized translation memories to successfully in reduce the translation workload, especially in repetitive tasks such the translation of content that only partially changes over time (product manuals, company websites).
In partnering with the Research Centre for Next Generation Localisation (CNGL), we will integrate EuroMatrixPlus resources with CNGL research on standards and interoperability in localisation workflows. We will combine the technological advances in machine translation which are developed by other work packages with the industrial workflow processes used by the localisation industry. The close collaboration with industrial partners outside of EuroMatrixPlus will widen the reach of the results of this project and directly benefit the localisation industry in Europe.

WP7: Evaluation Campaign
Much of the progress in machine translation in this decade has been driven by open evaluation campaigns, where developers of machine translation systems are tasked to translate a previously unseen test corpus with their system and have their translation performance evaluated against other participants. The competitive aspect of these campaigns has driven researchers to focus on the most important problems for translation performance. The collaborative aspect of the follow-up meetings where methods are discussed in detail have contributed to the quick adoption of best known methods and the validation of novel approaches.
Almost all members of the EuroMatrixPlus consortium have participated in and helped to organize evaluation campaigns, most notably the series of workshops organised alongside ACL, the premier conference in computational linguistics. We will continue our efforts to provide a forum dedicated to the translation of European languages.

WP8: Project Management and Dissemination

WP9: Integrating Slovak Language Resources
The main goal of this work package is to include the Slovak language resources into the project.

WP10: HPSG-based Statistical Translation
The focus of this work package is the development of a statistical model for translation between Bulgarian and English. This will be done on the basis of a parallel HPSG-based treebank.

Multimedia Content Select a Tab

There isn't any SlideShare for this case
There isn't any image for this case
There isn't any Video for this case
In order to send a message you need to be registered at least one month and have earned more than 150 kudos.
go to the SEMIC web page
eGovernment