Machine Translation Trends in Japan Dr. Sophia Ananiadou, Senior Lecturer, Department of Computing and Mathematics, Manchester Metropolitan University A common complaint of localizers and translators in relation to Japan is the poor quality of the tools available. In this article Dr. Sophia Ananiadou, author of a British study on Japanese machine translation systems examines market needs and developments, and especially the shift away from low-end MT to more sophisticated systems. She also presents the results of a poll of existing Japanese MT suppliers and researchers, in which she asked about products relevant to localization and professional translation. The Japanese machine translation (MT) market has changed drastically since the beginning of the 1990s. Before that, the main customers of MT systems were mostly big organizations-either public-service organizations or vast companies (notably car manufacturers). In addition, there were the (typically) smaller translation companies. These organizations would hire professionals, not all of whom would be full-time translators, but all of whom would be users of MT systems. The use of MT systems in such organizations, however, was not a great success because of the cost of human post-editors needed. Then came the rapid expansion of the Internet and the World Wide Web. This has changed the type of potential (and actual) user from the professional to the novice/casual user, who deploys MT for network browsing. The two types of user, use MT for different purposes: while the former use MT for information dissemination, the latter do so for information gathering. The difference in usage implies different requirements for MT. The key to the success of MT for information dissemination is the quality of translation. Translation has to be good and natural enough for public consumption, so that revisions by human translators can be kept to a minimum. On the other hand, price and ease of use are crucial in MT for information gathering in general and network browsing in particular. MT providers in Japan believed that improving translation quality, a task that required further research investment, was a much more difficult task than transferring existing systems to PCs with integrated interfaces and network browsers that were sold cheaply. Almost all the MT providers in Japan adopted this view in the early 1990s and developed their own MT systems for browsers. However, fierce competition in the market has greatly lowered the price of this type of MT. In addition, many MT systems are bundled with other software for PCs, a fact which reduced profit margins even further. Nevertheless, MT providers have regained confidence in their systems due to this. As a result of the confidence gained through the profits currently being generated with these systems, some providers started to look seriously at the potential of other types of MT: i.e. MT for information dissemination by professionals. MT systems for information dissemination, together with translation memory systems and integrated terminology management packages, are crucial for localization. Therefore, I conducted a small survey of companies and organizations which already have such MT systems or relevant skills. The following questions were put: a) What are the recent developments of your company regarding MT for professional translators? b) Are these developments geared towards the localization of products? c) Do you have on-line translation aids for professional translators (e.g. on-line terminology look-up, translation memory techniques, source text analysis/control, or on-line translation of formatted documents)? d) If you don't have any of the above, do you plan to develop them in the future? The responses to the queries confirm the view that MT developers in Japan have started to look at alternatives to MT systems for WWW browsing, and that the market will go through another drastic change in a few years' time. Ricoh, for example, is developing an authoring system for Japanese into English, while NTT plans to use its MT system to translate newspaper articles in co-operation with a well-known newspaper company. While quite a few developers (Hitachi, NEC and IBM, for example) are well aware of the potential use of their system for localization, the actual use of MT for localization is still quite limited in Japan. However, in their responses to my queries, a few developers and researchers mentioned possible uses of new techniques for localization-oriented MT. Dr. Kaji (Hitachi), for example, indicated that his research into deriving translation rules from corpora would be very useful for MT applications in the localization area. Their responses also reveal interesting aspects of Japanese MT products, MT markets and the present state of their MT technology. (A summary of the responses by individual organizations can be found at the end of this article). The main conclusions are as follows: 1) Developers rule Trends in Japanese MT are largely guided by MT developers. The involvement of users such as professional translators and organizations which use MT is still scarce. As a result, potentially useful technologies have not been fully exploited for users' benefit. The absence of user involvement may explain the relatively low interest in translation memory systems in Japan. As an example, a translator (Mr. M. Moriguchi) who works for a well-known electronics company in Japan, thinks that good terminology is the key to good translation, but his opinion is not shared by the translation section of his company, let alone the MT system his company is selling. The translators of his company do not use any MT system. 2) Technical innovation However, quite a few new technologies have been built into the MT products in the last few years. Most have been developed with MT browsers in mind, but they may be useful for MT for information dissemination as well. Thus MT systems by Hitachi and NEC now have facilities for a group of translators to share the same dictionaries and terminology data bases. Quite a few MT systems can handle HTML texts with figures (cf. the response by NEC). 3) Hybrid Systems Some of the MT systems surveyed successfully integrate conventional MT technology with some aspects of example-based techniques, which can be seen as an advanced form of translation memory. As Dr. K. Takeda (IBM Japan) noted, their technique of using patterns will be useful for translating texts of specific sublanguages, such as manuals for software products. Hitachi's HICAT-JE also uses pattern-based translation similar to the IBM system. The example-based MT produced by ATR will also be used by professional translators at JST (formerly JICST). Although we will have to wait and see how effective these new techniques will be for the translation of specialized texts, these three systems seem to integrate successfully conventional MT techniques with example-based ones. They also seem to be suitable for localization purposes. 4) Move towards standardization Not much attention has been paid to the standardization of linguistic resources in Japan. However, a group of MT companies (NEC, Toshiba, Nova, etc.) led by AAMT (Asian Association for Machine Translation) has started to design standard formats for MT dictionaries. If their efforts are successful, it will bring tremendous advantages: for example, different MT systems will be able to share the same dictionaries. Even though the standardization of all MT system dictionaries would be a difficult task, it might be feasible to arrive at a common format for the translation dictionaries used for technical terms. Better Terminology Needed The collection of lexical resources in Japan is impressive. In particular, dictionaries developed by EDR, NTT and IPA are used by many companies as common lexical linguistic resources. However, there is a lack of bilingual or multilingual terminology data bases. Organizations such as JST and NACSIS only have monolingual terminological collections. While computerized collections of English-Japanese pairs of technical terms are available, there is little control over their terminological quality. Dr. Ishikawa (University of Library and Information), a terminologist, reported that a new professional organization, EAFTerm (East Asia Forum on Terminology), has been established by a joint initiative of CDICCI (China Standardization and Information Classification and Coding Institute) and JTA (Japan Terminology Association). EAFTerm aims to build a multilingual terminology bank of East Asian languages (Japanese, Chinese and Korean). To sum up, there is a realization now in Japan, that MT must address users such as professional translators. More effort is needed in producing high-quality, consistent, multilingual terminological resources which would contribute to MT localization. Survey Responses In the next section, I shall summarize the individual responses to my queries. 1. Hitachi Hitachi has been developing an MT system for translating manuals and patents. This client-server system was released in January 1998. Translators connect to the server from their personal computers, thus saving hard disk space. The translation engine, the basic dictionary, the technical term dictionary and the users' dictionaries are all stored on the server. This system helps the office translators to translate technical documents. They have also developed technical terminologies for specialized fields. Another development at Hitachi is knowledge acquisition from corpora: Extracting terminology from Japanese text Development of a tool for extracting unknown and compound words to support user dictionary construction. Unknown words are extracted by detecting anomalous patterns in morphological analysis results. Terminologies, mainly compound nouns, are also extracted using word sequence patterns and frequency information. Extracting word correspondences from parallel corpora. The method utilizes co-occurrence information to associate words. Development of a prototype for a template-based translation engine. A pair of sentences which are translations of each other are converted into a translation template, which contains slots to be substituted with a pair of phrases which are translations of each other. Translation is performed by template matching and embedding slots. These technologies are viewed as ways to customize MT systems. Hitachi has applied them to its Japanese-English MT system (HICATS/JE), with the aim of translating technical documents (e.g. manuals and patents). As for on-line translation aids, HICATS/JE includes a function for diagnosing input Japanese sentences. It supports users in pre-editing, by detecting morphological, syntactic and semantic ambiguities in input sentences, and long sentences. 2. ATR Interpreting Telecommunications Research Labs ATR and the Japan Science and Technology Corporation will start joint development of a prototype system for translating scientific articles from English to Japanese. JST is now storing translated scientific articles by hand, and wants to develop HAMT in this area. A prototype system is underway (use of translation example pairs, expression patterns); an initial evaluation of translation quality was positive. Next year, after the evaluation phase, JST will start to use the HAMT system in house. This joint project will also focus on developing translation aids for professional translators. In JST, translators use workstations and PCs linked to a main server which has a large terminology bank and versions of both original and translated articles. 3. NEC NEC's translation software was originally designed for professional translators, specifically focusing on the translation of manuals. They have numerous dictionaries of technical terms in many fields. The company did not report on any recent developments for professional translators but it has been modifying its translation software for personal users on the Internet. There are no concrete plans to introduce MT for localization purposes, but some of the new features, they believe, are useful for this purpose. As an example, their MT system can translate Word files, HTML files, PowerPoint files and Help files while maintaining the original layout. Software developers use their MT system to translate the comments or messages in their programs, to translate documents for production or sales, and to translate manuals or specifications from other companies. 4. NTT NTT is developing a pattern translation MT system which is a localization of the general-purpose ALT-J/E system. The domain is Japanese financial news translation. The results are reportedly good enough to obviate the need for human pre- or post-editing. The system is on-line, and newspaper articles are sent from the newspaper company to the translation center on-line. The editor only confirms the translation for distribution. The system will be the first to be commercially introduced into the translation process in Japan, including for e-mail. E-mail translation packages are provided by companies such as Oki Electric. They are very cheap and pre-installed on PCs. A spin-off of NTT's MT system is the Japanese dictionary Goi-Taikei, which has been edited from their MT machine dictionary. 5. Ricoh (supported by the Ministry of Posts and Telecommunications) Ricoh has been building a computer-assisted English writing tool ("Writer's Helper") for Japanese since autumn 1996. It reportedly has two advantages over existing language tools (including various types of MT system): efficient extraction of language information needed by the user, and target language-driven user assistance. In this tool, the EFL (English as a Foreign Language) generation process is treated as the process during which EFL learners are required to satisfy lexical and structural constraints/preferences in the English language. The "Writer's Helper" has three major engines: the "Word Selection Helper," the "Sentence Pattern (or Verb Complementation) Helper," and the "Reference to Corpora." These engines together enable the user to efficiently write well-formed English sentences. The "Word Selection Helper" is engaged when users needs guidance on choosing the most appropriate word or phrase. An English-Japanese dictionary was used as the language resource for this engine. English candidate words or phrases can be extracted from this dictionary; guidance is provided on how to choose between the candidates (e.g. key differences in meaning among synonymous words/phrases) when the user inputs a Japanese expression (e.g. a Japanese verb) on the keyboard. The candidates are listed in word frequency order based on the CELEX lexical database. The "Sentence Pattern (or Verb Complementation) Helper" is activated to assist the user in selecting the appropriate verb complementation. The "Reference to Corpora" engine runs with it. When the user specifies a possible complement type, example sentences are extracted from the COMLEX Syntax Corpus (monolingual corpus) and from the newly-developed English-Japanese bilingual corpus. For the moment, Ricoh's "Writer's Helper" is a general-purpose language assistant. It can be customized to tune language resources for the three engines, to optimize the domain-specific selection of words/phrases and verb complementation. Sample sentences from specific domains will also help to technical writers or translators who do not know much about the domain in question. 6. IBM IBM's MT system is primarily targeted at novice Internet users, and has few features for professionals except for Version 2.0 (released in 97/6), which allows users to define patterns for translating noun/verb/ adjective/prepositional/adverbial phrases, subordinate clauses, and sentences. It is much more powerful and has a lot more localization capability than conventional user dictionaries and translation memories. The pattern-based approach is geared to idiomatic, collocational, contextual and domain-specific translations. It is closely related to localization since many expressions, such as copyright descriptions, disclaimers, and technical terminology, require specific translations. No on-line translation aids were mentioned. They are aware that some of the professional features are very important to attract a wider range of users. Their concern is that the market of professional translators who would really want to use such a system is very small compared to the millions of novice Internet users. Disclaimer and Acknowledgments The author acknowledges that this survey is not exhaustive. She wishes to thank Dr. H. Iida, Dr. K. Iida, Dr. S. Ikehara, Dr. H. Kaji, Dr. K. Kimpara, Dr. M. Moriguchi, Mr. Yasuo Nakajima, Dr. M. Narita, Dr. K. Takeda, and Prof. J. Tsujii for their valuable comments. Any mistakes or misinterpretations are her own responsibility. Dr. Sophia Ananiadou Department of Computing and Mathematics John Dalton Building, Chester Street Manchester, M1 5GD Tel +44 (161) 247 1542 Fax +44 (161) 200 3099 E-mail S.Ananiadou@doc.mmu.ac.uk