{"id":62944,"date":"2023-09-13T08:33:00","date_gmt":"2023-09-13T06:33:00","guid":{"rendered":"https:\/\/phrase.com\/?p=62944"},"modified":"2023-11-07T08:02:04","modified_gmt":"2023-11-07T07:02:04","slug":"machine-translation-customization","status":"publish","type":"post","link":"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/","title":{"rendered":"Quest for Quality: Exploring Machine Translation Customization"},"content":{"rendered":"\n<div id=\"acf\/text-block_0ae97efed3aafb0900c8107e3d209219\" class=\"pxblock pxblock--text spacing--default bg--white\">\n\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wysiwyg animate-in\">\n\t\t\t<p><span style=\"font-weight: 400;\">Businesses worldwide are now leaning on <\/span><span style=\"font-weight: 400;\">machine translation<\/span><span style=\"font-weight: 400;\"> (MT) more than ever. In 2022, the MT market size <a href=\"https:\/\/www.gminsights.com\/industry-analysis\/machine-translation-market-size\">exceeded $982M<\/a> and is projected to expand at a compound annual growth rate of 23% until 2032.<\/span><span style=\"font-weight: 400;\">\u00a0<\/span><span style=\"font-weight: 400;\">This is largely because MT has become more reliable, and the pressure for global brands to deliver customized content in multiple languages more quickly keeps growing\u2014all while keeping costs in check.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation\/\">Machine translation<\/a> increasingly delivers fast and cost-effective translations, but maintaining translation quality remains a pressing issue.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To succeed in today\u2019s fast-paced global market, companies must localize content at scale that aligns with their domain, captures the right tone, and keeps their brand voice consistent across languages and distribution channels.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">That&#8217;s where machine translation customization comes into play. By adapting and training MT engines to provide more optimal output, <strong>MT customization gives a strategic edge to companies aiming to connect with international audiences, drive engagement, and increase conversions across markets<\/strong>.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Keep reading to find out how you, too, can make MT customization work for your business.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_69_1 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Overview<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#what-is-mt-customization\" title=\"What is MT customization?\">What is MT customization?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#the-evolution-of-mt-customization\" title=\"The evolution of MT customization\">The evolution of MT customization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#the-value-of-mt-customization\" title=\"The value of MT customization\">The value of MT customization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#who-can-benefit-from-mt-customization\" title=\"Who can benefit from MT customization?\">Who can benefit from MT customization?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#what-are-the-types-of-mt-customization\" title=\"What are the types of MT customization?\">What are the types of MT customization?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#light-mt-customization\" title=\"Light MT customization\">Light MT customization<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#full-mt-customization\" title=\"Full MT customization\">Full MT customization<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#how-to-prepare-your-data-for-mt-customization\" title=\"How to prepare your data for MT customization?\">How to prepare your data for MT customization?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#key-types-of-data-used-for-mt-customization\" title=\"Key types of data used for MT customization\">Key types of data used for MT customization<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#best-practices-for-machine-translation-data-cleaning\" title=\"Best practices for machine translation data cleaning\">Best practices for machine translation data cleaning<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#filter-segments-by-age\" title=\"Filter segments by age\">Filter segments by age<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#align-source-and-target-segments\" title=\"Align source and target segments\">Align source and target segments<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#check-segment-length\" title=\"Check segment length\">Check segment length<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#remove-non-translatables\" title=\"Remove non-translatables\">Remove non-translatables<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#remove-duplicates\" title=\"Remove duplicates\">Remove duplicates<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#language-checks\" title=\"Language checks\">Language checks<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#inline-tags\" title=\"Inline tags\">Inline tags<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#a-brief-look-at-the-training-of-custom-mt-models\" title=\"A brief look at the training of custom MT models\">A brief look at the training of custom MT models<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#what-it-takes-to-train-a-custom-mt-model\" title=\"What it takes to train a custom MT model\">What it takes to train a custom MT model<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#charting-mt-model-evaluation-and-fine-tuning\" title=\"Charting MT model evaluation and fine-tuning\">Charting MT model evaluation and fine-tuning<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-customization\/#your-destination-is-right-ahead-mt-customization\" title=\"Your destination is right ahead: MT customization\">Your destination is right ahead: MT customization<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"what-is-mt-customization\"><\/span><span style=\"font-weight: 400;\">What is MT customization?<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Picture this: In the heart of a bustling city, a skilled tailor has made a name for himself by crafting suits that epitomize precision and artistry. Just as he tailors each suit to the wearer&#8217;s unique needs, MT customization refines translation engines for industry-specific accuracy.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This fusion of craftsmanship and technology ensures that translations seamlessly fit their context, much like a bespoke suit conveys style and confidence. Both endeavors exemplify how attention to detail transforms ordinary elements into exceptional outcomes.<\/span><\/p>\n<p><b>MT customization is the process of creating, deploying, and maintaining a machine translation engine using data to generate high-quality translations in a specific language pair and domain.<\/b><span style=\"font-weight: 400;\"> Also known as custom MT, it ensures that the final output aligns seamlessly with the unique requirements of a particular domain or industry.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"the-evolution-of-mt-customization\"><\/span><span style=\"font-weight: 400;\">The evolution of MT customization<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">To truly appreciate the power of MT customization, let&#8217;s rewind a bit. Not too long ago, the idea of building a personalized MT engine appeared rather distant. It was a resource-intensive endeavor that required substantial technical expertise. That\u2019s why choices were limited: One could make a significant investment, possess technical know-how, or rely on a costly external partner.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">However, just like all technology, MT customization evolved. As early as 2017, several MT providers began exploring ways to make customization more accessible. The aim was to empower language enthusiasts and developers with the ability to craft tailored MT solutions without breaking the bank.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In 2018, Google unveiled AutoML, a groundbreaking tool intended to democratize the MT customization process. Sundar Pichai, Google&#8217;s CEO, succinctly <\/span><a href=\"https:\/\/blog.google\/technology\/ai\/making-ai-work-for-everyone\/\"><span style=\"font-weight: 400;\">captured its essence<\/span><\/a><span style=\"font-weight: 400;\">:<\/span><\/p>\n<blockquote><p><span style=\"font-weight: 400;\">We hope AutoML will take an ability that a few PhDs have today and will make it possible in three to five years for hundreds of thousands of developers to design new neural nets for their particular needs.<\/span><\/p><\/blockquote>\n<p><span style=\"font-weight: 400;\">Today, the landscape is quite different. You have access to a variety of customizable MT engines, in addition to generic engines that offer varying degrees of customization.<\/span><\/p>\n<p><b>What was once a cost-prohibitive endeavor has now transformed into an accessible resource for those seeking precision and excellence in their translations.<\/b><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/blog-cta-block_8cd2a6637268cc9e9e2fa220c3320688\" class=\"pxblock pxblock--blog-cta bg--grey image--orientation-portrait\">\n\t<div class=\"block-container\">\n\t\t\t\t\t<div class=\"image image--align-middle\">\n\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1811\" height=\"2560\" src=\"https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-scaled.jpg\" class=\"attachment-original size-original\" alt=\"The present and future of machine translation study cover.jpg | Phrase\" srcset=\"https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-scaled.jpg 1811w, https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-212x300.jpg 212w, https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-724x1024.jpg 724w, https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-768x1086.jpg 768w, https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-1086x1536.jpg 1086w, https:\/\/phrase.com\/wp-content\/uploads\/2022\/09\/present-future-machine-translation-study-cover-1448x2048.jpg 1448w\" sizes=\"(max-width: 1811px) 100vw, 1811px\" \/>\t\t\t<\/div>\n\t\t\t\t<div class=\"content\">\n\t\t\t<p class=\"subhead\">Download for free<\/p>\n<p class=\"h6\">Your up-to-the-minute guide to machine translation<\/p>\n<p class=\"small\">Learn about new technologies to improve machine translation output quality, the latest on MT post-editing pricing models, and how to best shop for machine translation.<\/p>\n<p><a class=\"btn btn--outline\" href=\"https:\/\/phrase.com\/resources\/the-present-and-future-of-machine-translation\/\">Download guide<\/a><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/text-block_bb8136effed5195f4f6d1aebac8c7e30\" class=\"pxblock pxblock--text spacing--default bg--white\">\n\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wysiwyg animate-in\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"the-value-of-mt-customization\"><\/span><span style=\"font-weight: 400;\">The value of MT customization<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Custom machine translation doesn\u2019t only bring the power of generic MT engines\u2014it goes the extra mile. In a translation race where time is of the essence, custom MT keeps the pace, swiftly processing large volumes of text. It helps you avoid the time-consuming complexities of human translation and moves forward as a cost-effective solution, freeing up resources that you can invest to improve quality.<\/span><\/p>\n<p><b>Quality is precisely what sets custom machine translation apart from generic MT engines.<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The quality of the translation depends on the quality of the machine translation models used. Think of trained models in custom machine translation as language experts. Their keen understanding of linguistic nuances ensures superior-quality translations. This honed skill means fewer bumps on the translation road and minimal to no <\/span><a href=\"https:\/\/phrase.com\/blog\/posts\/machine-translation-post-editing-best-practices\/\"><span style=\"font-weight: 400;\">machine translation post-editing<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">With higher efficiency and unprecedented quality and accuracy, global businesses can quickly roll out multilingual content and strategically allocate resources to improve the overall customer experience. This, in turn, results in improved brand perception, higher customer engagement, and increased conversions across markets\u2014driving sustained growth for the business on the international stage.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"who-can-benefit-from-mt-customization\"><\/span><span style=\"font-weight: 400;\">Who can benefit from MT customization?<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><b>As MT customization has become more accessible, it&#8217;s now a resource that a wider range of users can take advantage of.<\/b><\/p>\n<p><span style=\"font-weight: 400;\">On one hand, any organization that has a sufficient amount of translation data suitable for training can tap into this transformation. Recent advances in MT customization have reduced the required volume of data quite significantly. A sufficiently large translation memory (TM) is all you need to start amplifying your linguistic capabilities.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">On the other hand, organizations that navigate large volumes of content within specific domains stand to gain significantly.<\/span><\/p>\n<table style=\"width: 100%;\" border=\"1\" cellpadding=\"6\">\n<thead>\n<tr>\n<td style=\"width: 100%;\" colspan=\"2\">How specific industries benefit from machine translation customization<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Ecommerce and online retail<\/strong><\/td>\n<td style=\"width: 55%;\">In the ecommerce and online retail sector, custom MT engines can translate product descriptions and user reviews, thus enhancing the overall shopping experience.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Travel and hospitality<\/strong><\/td>\n<td style=\"width: 55%;\">Within the travel and hospitality industry, property listings and user reviews can be rendered with a personal touch.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>SaaS (software as a service)<\/strong><\/td>\n<td style=\"width: 55%;\"><span style=\"font-weight: 400;\">Software companies can benefit from user documentation, help content, and manuals being tailored to their specific industry jargon and terminology.<\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Automotive<\/strong><\/td>\n<td style=\"width: 55%;\"><span style=\"font-weight: 400;\">Car makers can benefit from MT customization for various materials, including customer comments, dealer feedback, manuals, and production protocols\u2014with a projected business value reaching several million, as per the <a href=\"http:\/\/www.lrec-conf.org\/proceedings\/lrec2020\/industrytrack\/pdf\/2020.lrec2020industrytrack-1.7.pdf\">example of BMW<\/a>.<\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Finance and fintech<\/strong><\/td>\n<td style=\"width: 55%;\"><span style=\"font-weight: 400;\">In the financial and fintech industry, MT customization proves valuable to accurately translate industry-specific vocabulary, incorporate risk-related terminology, and align with the preferred tone of each client for compliance documentation, regulations, and financial reports.<\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Pharmaceutical<\/strong><\/td>\n<td style=\"width: 55%;\"><span style=\"font-weight: 400;\">The pharmaceutical industry can transform the translation challenges of the medical jargon included in prescriptions, patents, clinical trials, test results, and marketing material into benefits ensuring maximum accuracy and fluency with customized systems.<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span class=\"ez-toc-section\" id=\"what-are-the-types-of-mt-customization\"><\/span><span style=\"font-weight: 400;\">What are the types of MT customization?<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">There are 2 primary forms of MT customization: light and full. Your choice between light and full MT customization depends on the nature of your translation project and the desired level of accuracy.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It&#8217;s similar to selecting your attire for a trip: A light outfit suits a family weekend, while a full suit is ideal for a business trip. The more you move from general to industry-specific content, the greater the customization required.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"light-mt-customization\"><\/span><span style=\"font-weight: 400;\">Light MT customization<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Light MT customization entails tweaking engine-specific features to fine-tune translations. Think of it as adjusting the dials on a radio to get the best sound quality. This includes, among other things:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Glossary adaptation<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">\u201cDo-not-translate\u201d lists<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Translation memory adaptation<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Stylistic control<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">For example, <\/span><a href=\"https:\/\/support.deepl.com\/hc\/en-us\/articles\/4406432463762-About-the-formal-informal-feature\"><span style=\"font-weight: 400;\">DeepL&#8217;s formality feature<\/span><\/a><span style=\"font-weight: 400;\"> showcases light customization.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"full-mt-customization\"><\/span><span style=\"font-weight: 400;\">Full MT customization<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Full MT customization takes the process a step further. It involves training an MT engine using meticulously curated datasets to generate translations that precisely capture jargon, terminology, style, and tone of voice.<\/span><\/p>\n<p><b>Essentially, full MT customization results in a translation engine that speaks your language\u2014both figuratively and literally.<\/b><\/p>\n<h2><span class=\"ez-toc-section\" id=\"how-to-prepare-your-data-for-mt-customization\"><\/span><span style=\"font-weight: 400;\">How to prepare your data for MT customization?<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">A short while back, organizations needed to feed millions of segments to train an MT engine. However, those days are gone\u2014the process now needs considerably fewer segments.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">What holds the key to training an MT engine is bilingual data. The greater the volume and variety of quality bilingual data, the better equipped the engine becomes in generating high-quality translations in the long run.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"key-types-of-data-used-for-mt-customization\"><\/span><span style=\"font-weight: 400;\">Key types of data used for MT customization<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">There are 2 pillars of data that underpin MT customization: translation memories and corpora.<\/span><\/p>\n<h4><span style=\"font-weight: 400;\">Translation memories<\/span><\/h4>\n<p><span style=\"font-weight: 400;\"><a href=\"https:\/\/phrase.com\/blog\/posts\/translation-memory\/\">Translation memories<\/a> (TMs) stand as the bedrock of linguistic evolution. They have become accessible and familiar to most organizations operating in the translation and localization industry.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Just a few years ago, TMs were mainly regarded as repositories of human-revised translations. However, they are now invaluable in shaping the trajectory of MT engines, guiding them to replicate content with remarkable accuracy.<\/span><\/p>\n<h4><span style=\"font-weight: 400;\">Corpora<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">Corpora are large, structured collections of texts in multiple languages. These texts are carefully curated datasets acquired from external sources and selected to serve as training data for MT models.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">By supplementing the TM data, corpora work with great efficacy, improving both efficiency and precision\u2014particularly in specific language pairs and specialized domains.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Embracing corpora enriches the localization journey, fostering a well-rounded approach that harnesses the inherent strengths of both internal and external linguistic resources.<\/span><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/blog-cta-block_2a98a2165c9768e7532c19e7e9e95a14\" class=\"pxblock pxblock--blog-cta bg--grey image--orientation-square\">\n\t<div class=\"block-container\">\n\t\t\t\t<div class=\"content\">\n\t\t\t<p class=\"subhead\" style=\"text-align: center;\">Dive deeper<\/p>\n<p class=\"h5\" style=\"text-align: center;\">10 key steps to creating a machine translation strategy<\/p>\n<p class=\"small\" style=\"text-align: center;\"><span style=\"font-weight: 400;\">Learn how to design a machine translation strategy that can help your brand connect with international customers at full speed.<\/span><\/p>\n<p style=\"text-align: center;\"><a class=\"btn btn--outline\" href=\"https:\/\/phrase.com\/blog\/posts\/how-to-leverage-machine-translation\/\">Discover steps<\/a><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/text-block_7f72bc40e922498b0fdfcef89e53e0c2\" class=\"pxblock pxblock--text spacing--default bg--white\">\n\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wysiwyg animate-in\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"best-practices-for-machine-translation-data-cleaning\"><\/span><span style=\"font-weight: 400;\">Best practices for machine translation data cleaning<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">TMs and corpora are the fundamental blocks of data for MT customization. To provide your custom engine with a\u00a0 solid foundation, it is essential to first prepare meticulously curated training data. For this data cleaning is essential.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Various techniques can help you refine and enhance data quality, optimizing the engine\u2019s performance:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Filter segments by age<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Align source and target segments<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Segment length<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Remove non-translatables<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Remove duplicates<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Language check<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Inline tags<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Previously, data cleaning relied on extensive (and expensive) manual review, but a large part of data preparation can now be automated. These strategies all work in synergy to refine and clean the data, ultimately enhancing the effectiveness of the training process. Let\u2019s take a look at each of them.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"filter-segments-by-age\"><\/span><span style=\"font-weight: 400;\">Filter segments by age<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">For certain types of documents, filtering TM segments based on their age is a fundamental technique to clean data for MT as the efficiency of engine training is influenced by segment age adaptation to content.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The golden rule is to maintain the right balance between timeliness and relevance for ensuring accurate training. Utilizing segments that are either too outdated or overly current can backfire, especially when dealing with inherited or legacy translation memories whose quality, origin, attributes, and historical usage are not controlled.\u00a0<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"align-source-and-target-segments\"><\/span><span style=\"font-weight: 400;\">Align source and target segments<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Timeliness goes hand in hand with accuracy\u2014this is where the alignment of source and target segments comes into play. It&#8217;s imperative to meticulously validate that segment pairs intended for training accurately convey the same meaning. This alignment safeguards against any discrepancies or inconsistencies that might negatively impact the performance of the MT engine.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"check-segment-length\"><\/span><span style=\"font-weight: 400;\">Check segment length<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Segment length is also crucial in data refinement. Pairs of segments that are excessively lengthy or unusually short can hinder the quality of MT. It can also be necessary to do this for purely technical reasons, as some\u00a0 customizable MT engines frequently impose segment length restrictions.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To address this, you can apply techniques like implementing a minimum character count, establishing guidelines for sentence pair length, and maintaining a balanced length ratio.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"remove-non-translatables\"><\/span><span style=\"font-weight: 400;\">Remove non-translatables<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Next up is removing non-translatable elements. Some words or phrases might lack direct translations between languages, some do not require translation at all\u2014for example names and addresses. It&#8217;s advisable to eliminate them from the data to prevent confusion and inaccuracies in the translation process.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"remove-duplicates\"><\/span><span style=\"font-weight: 400;\">Remove duplicates<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Preventing data redundancy is just as important. Eliminating repeated or nearly identical segment pairs helps maintain data integrity, preventing undue influence on MT output.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"language-checks\"><\/span><span style=\"font-weight: 400;\">Language checks<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Language checks matter as well. Sometimes translation memories used for customization can contain segment pairs with the wrong language pair. Making sure that all segments align with the desired language is vital for maintaining consistent and accurate customization.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"inline-tags\"><\/span><span style=\"font-weight: 400;\">Inline tags<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The existence of inline tags within translation memories calls for attention. These tags, which often denote variables or special formatting, might not be supported consistently across different MT engines. That\u2019s why, in certain instances, it&#8217;s worth excluding them from the training data to prevent potential inconsistencies in translation outcomes.<\/span><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/blog-cta-block_817faf59d94cf2d89177a46c0289c732\" class=\"pxblock pxblock--blog-cta bg--grey image--orientation-square\">\n\t<div class=\"block-container\">\n\t\t\t\t\t<div class=\"image image--align-middle\">\n\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1204\" height=\"1204\" src=\"https:\/\/phrase.com\/wp-content\/uploads\/2023\/03\/machine-transtion-report-key-visual.png\" class=\"attachment-original size-original\" alt=\"Machine transtion report key visual | Phrase\" srcset=\"https:\/\/phrase.com\/wp-content\/uploads\/2023\/03\/machine-transtion-report-key-visual.png 1204w, https:\/\/phrase.com\/wp-content\/uploads\/2023\/03\/machine-transtion-report-key-visual-300x300.png 300w, https:\/\/phrase.com\/wp-content\/uploads\/2023\/03\/machine-transtion-report-key-visual-1024x1024.png 1024w, https:\/\/phrase.com\/wp-content\/uploads\/2023\/03\/machine-transtion-report-key-visual-150x150.png 150w, https:\/\/phrase.com\/wp-content\/uploads\/2023\/03\/machine-transtion-report-key-visual-768x768.png 768w\" sizes=\"(max-width: 1204px) 100vw, 1204px\" \/>\t\t\t<\/div>\n\t\t\t\t<div class=\"content\">\n\t\t\t<p class=\"h4\">Interactive MT report: Uncover top performers<\/p>\n<p class=\"small\">Find out how leading machine translation engines perform for different content types using the latest data in our quarterly machine translation report.<\/p>\n<p><a class=\"btn btn--outline\" href=\"https:\/\/phrase.com\/resources\/machine-translation-report\/\">Get MT insights<\/a><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/text-block_311bf3aa3647b27578031d860a33f150\" class=\"pxblock pxblock--text spacing--default bg--white\">\n\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wysiwyg animate-in\">\n\t\t\t<h2><span class=\"ez-toc-section\" id=\"a-brief-look-at-the-training-of-custom-mt-models\"><\/span><span style=\"font-weight: 400;\">A brief look at the training of custom MT models<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The realm of MT customization is intricate and unveils a dynamic landscape of training custom MT models. Below is an overview of the most popular MT models with customization support:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Amazon Active Custom Translation<\/b><span style=\"font-weight: 400;\"> offers an agile platform driven by user input showcasing human-machine collaboration.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Globalese Custom NMT<\/b><span style=\"font-weight: 400;\"> blends neural networks with advanced post-editing, ensuring meticulous adaptation.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Google AutoML Translation<\/b><span style=\"font-weight: 400;\"> refines models through iterative learning.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>IBM Custom NMT<\/b><span style=\"font-weight: 400;\"> emerges as an exemplar of AI-powered precision, while Microsoft Custom Translator&#8217;s adaptive learning captures context intricacies.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>RWS Language Weaver<\/b><span style=\"font-weight: 400;\"> focuses on domain specificity, ensuring robust comprehension.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>SDL PNMT<\/b><span style=\"font-weight: 400;\"> and <\/span><b>Systran PNMT<\/b><span style=\"font-weight: 400;\"> present cutting-edge neural models for intricate language pairs.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Tilde<\/b><span style=\"font-weight: 400;\"> stands as a seasoned player integrating linguistic expertise.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Yandex Translate Custom<\/b><span style=\"font-weight: 400;\"> fosters fine-tuned translations.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Phrase NextMT<\/b><span style=\"font-weight: 400;\"> is the first <\/span><a href=\"https:\/\/phrase.com\/blog\/posts\/neural-machine-translation\/\"><span style=\"font-weight: 400;\">neural machine translation<\/span><\/a><span style=\"font-weight: 400;\"> engine developed with a translation management system in mind, providing Phrase customers with a greater degree of customization, automation, integration, and superior reporting. Now thanks to the Phrase Custom AI platform, it supports full customization.\u00a0<\/span><\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"what-it-takes-to-train-a-custom-mt-model\"><\/span><span style=\"font-weight: 400;\">What it takes to train a custom MT model<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Training a custom machine translation model usually consists of multiple steps, roles, and timeframes. In the case of Microsoft Custom Translator, Google Translate AutoML, and Amazon&#8217;s Active Custom Translation, individuals with technical expertise play crucial roles and invest approximately:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">10+ minutes for account creation<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">30+ minutes for the initial setup<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">30+ hours for parallel data preparation<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">30+ minutes for billing<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">6+ hours of training<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Alternatively, with <a href=\"https:\/\/phrase.com\/platform\/custom-ai\/\">Phrase Custom AI<\/a>, the custom model training process becomes more streamlined and user-friendly. It&#8217;s now possible to significantly reduce the time, expertise, and resources required to train your own custom machine translation engine.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Thanks to <\/span><span style=\"font-weight: 400;\">Phrase Custom AI<\/span><span style=\"font-weight: 400;\">, a process that previously took weeks can now be achieved in a matter of hours. Phrase Custom AI uses AI-powered data filtering, automated evaluation, and an intuitive interface to make engine customization available for everyone.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"charting-mt-model-evaluation-and-fine-tuning\"><\/span><span style=\"font-weight: 400;\">Charting MT model evaluation and fine-tuning<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The journey in machine translation doesn&#8217;t stop at training a model\u2014it&#8217;s just the beginning. The success of machine translation models depends on a careful process of evaluation and fine-tuning. You can assess the quality of machine translation models by using automated metrics, post-editing metrics, as well as through human evaluation.<\/span><\/p>\n<table style=\"width: 100%;\" border=\"1\" cellpadding=\"6\">\n<thead>\n<tr>\n<td style=\"width: 100%;\" colspan=\"2\">Machine translation evaluation methods<\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Automated metrics<\/strong><\/td>\n<td style=\"width: 55%;\">BLEU, COMET, TER, chrf3, and METEOR provide quantifiable insights into translation fidelity.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Human evaluation<\/strong><\/td>\n<td style=\"width: 55%;\">Involves standardized questionnaires to capture nuances only comprehensible to humans.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 35%; background-color: #f4f4f4;\"><strong>Post-editing metrics<\/strong><\/td>\n<td style=\"width: 55%;\">TER, editing time, edit distance, thinking time, and more offer concrete measures of translation accuracy and efficiency.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span class=\"ez-toc-section\" id=\"your-destination-is-right-ahead-mt-customization\"><\/span><span style=\"font-weight: 400;\">Your destination is right ahead: MT customization<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The journey of MT customization doesn&#8217;t come to a stop after a single evaluation\u2014it continues with ongoing and regular assessment. This continuous expedition empowers MT engines to adapt seamlessly to the constantly changing linguistic landscape.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Similar to how explorers update maps before embarking on new journeys, custom MT engines undergo periodic retraining using updated data. This process hones their abilities and boosts their performance, resulting in translations that embody practical excellence\u2014customized to the company\u2019s context and language\u2014signifying a significant return on investment.<\/span><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<div id=\"acf\/blog-cta-block_640cf203ca17b\" class=\"pxblock pxblock--blog-cta bg--green image--orientation-square\">\n\t<div class=\"block-container\">\n\t\t\t\t<div class=\"content\">\n\t\t\t<p class=\"h4\" style=\"text-align: center;\">Unlock the power of machine translation<\/p>\n<p class=\"small\" style=\"text-align: center;\">Discover advanced machine translation management features within our enterprise-ready TMS <span style=\"font-weight: 400;\">and create new business opportunities worldwide more quickly and efficiently.<\/span><\/p>\n<p style=\"text-align: center;\"><a class=\"btn btn--outline\" href=\"https:\/\/phrase.com\/solutions\/machine-translation\/\">Explore solutions<\/a><\/p>\n\t\t<\/div>\n\t<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Machine translation customization empowers businesses for global success with fast and higher-quality translations. Learn what makes it effective\u2014and how to make it your own.<\/p>\n","protected":false},"author":62,"featured_media":62947,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_stopmodifiedupdate":false,"_modified_date":"","_searchwp_excluded":"","footnotes":""},"categories":[41],"class_list":["post-62944","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-translation"],"acf":[],"_links":{"self":[{"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/posts\/62944"}],"collection":[{"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/users\/62"}],"replies":[{"embeddable":true,"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/comments?post=62944"}],"version-history":[{"count":51,"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/posts\/62944\/revisions"}],"predecessor-version":[{"id":69275,"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/posts\/62944\/revisions\/69275"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/media\/62947"}],"wp:attachment":[{"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/media?parent=62944"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/phrase.com\/wp-json\/wp\/v2\/categories?post=62944"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}