PDF Pitfalls [Part 1] - How to Translate PDF Documents in Several Languages



















“— I have some PDF documents that I need to translate in several languages.
Can this be done easily?”


Perhaps you've heard a question like this before and you are still looking for an answer. Before going any further, let's look a little more into the format known as PDF.

These days a huge number of documents are either created in or converted to PDF format. It has become the universal exchange format for electronic documents. The main advantage (and attraction) of PDF is its ability to preserve the look and feel of the original document by describing the low-level structural objects. Most PDF documents, however, are untagged and do not contain the basic high-level logical structure information. One consequence, among others, is that this makes extracting information particularly difficult: a true disadvantage in an age of open and flexible data structures.

Now, back to the question…
The answer is an unqualified MAYBE. Unfortunately, people are often led to believe that the PDF format is a publishing format and should therefore be easy to convert to MS Word. In fact, PDF is more a description of what the printed page looks like that a description of the document’s structure.

So, from a translation process perspective, the challenge consists in developing techniques that allow for changing the content, without losing the whole work of accomplished formatting. At this point, we have to say though that process optimization is an utopia when it comes to PDF document translation. Preparing PDF files that are suitable for translation will continue to be a major issue and this will get worse as more PDF creation methods become available.

“— So, you're saying that this can't be done?”

No, exactly. What we are saying is that for typical PDF documents, the file preparation can be partially automated, but you should expect that other parts will need to be done manually. Of course, the supported features depend greatly on the quality, the complexity and the level of markup that is required in the target documents.

The page elements that are usually easy to recognize, such as headers and footers, text flows, page grid, margins, line art, raster image, tables, headings, and callouts, can be challenging for an automated tool to recognize. The file preparation should be done carefully considering all particular structures that exist in the source PDF documents. If the documents do not share a consistent appearance, it will become even more difficult and time-consuming task.

To give you an idea of what we are talking about, let's take a look at one simple lay-out feature that can cause a lot of problems: columns. Many publications are printed in a multiple column layout. What this means, of course, is that the PDF, also contains the multiple columns,. But since the PDF is basically a page layout format, it contains information about the letters on the page and where they are to be printed. However, there is nothing in the PDF that specifies that some copy is in column one and other copy is in column two (or even that there are in fact two columns). The conversion tool must therefore analyze the geometry of the page and attempt to recognize a column layout. When the margin is tight, and two columns are quite close

together, conversion tools can often get confused and miss a multiple column layout, thereby horribly mangling together the text from two totally unrelated paragraphs. In this case, It will be virtually impossible to extract any text from such PDF, so the linguist will have to retype the entire source document before translation can start.

Our next posts will continue covering some aspects and issues related to PDF conversion.

Stay tuned! The game is not over yet!

Koro Language Discovered in India

A language that was previously unknown to science has been discovered in the far northeastern corner of India.

At a time of rapid language extinction, with one language estimated to die every two weeks, the news of the discovery of a new language is providing a glimmer of hope to linguists.

The language, known as Koro, was discovered by a team of linguists working with National Geographic’s Enduring Voices project in the state of Arunachal Pradesh who came to research two poorly documented languages, Aka and Miji. Arunachal Pradesh is a language hotspot due to the amount of languages spoken in the region. Koro was found to be very distinct from the other languages that are spoken nearby and is thought to have originated from a group of people enslaved and then brought to this area.

Koro was discovered when the research team began to hear a third, unknown language, which was found to be Koro. This came as a surprise since Koro was not listed in any scientific literature or in Indian language surveys or censuses. One of the differences noted when hearing Koro is that it has a lot of vowels which distinguished itself from other languages which had a lot of consonant clusters at the beginning.

Interesting facts about Koro are that it has only 800 to 1,200 speakers, is unwritten and is a member of the Tibeto-Burman language family. Even though Koro is a part of the Tibeto-Burman language family, Koro is so different that researchers could not identify any in the language family of which it may be closely related.

Unfortunately, as are many languages today, Koro is endangered. With the small amount of people speaking it in the villages, very few children are learning it as other nearby languages have higher prestige and are required to be learned in school instead. With this lack of language support, it is estimated that Koro may only survive a few more decades before becoming extinct.

Languages are important for a number of reasons to us. And the discovery of a new language is extremely important as it mainly helps us to understand just how the human mind works. All that we can learn about the smaller languages around the world only enriches what we can know about the possibilities of human language.

Excel Translations is the Market Leader in ISO 9001- and ISO 13485-certified Medical Translation Services. We provide the required hardware and software expertise as well as the in-depth knowledge of language, regulatory requirements, cultural nuances and taboos, and linguistic connotations to make the translated software application as functional for the foreign user as it is for the domestic user. Excel Translations' localization engineers routinely work with software resource and online help localization projects on Mac, Windows, NT, and Linux operating systems. Contact us for more information about our medical translation services.

Benefits of long term relationships between a Medical Products Manufacturer and an Expert Medical Translation Service Provider

In many instances, it takes an average of 3-5 months, with a number of medical translation projects within that time, to quickly achieve the top of the learning curve between medical manufacturers and medical translation service providers. Long term client and medical translation service provider relationships can improve the cost and time frame of translation project, and can also improve the intangible elements of a business relationship.

The medical translation agency working with a client over several projects begin to see ways to make improvements. These improvements include the following:
  • --The medical translation service provider is often able to guide the client toward optimal services after experience with several projects. For example, a client may not need in-country review services after several projects with fewer and fewer changes requested. On projects which require revisions only, a medical translation service provider may remind a client to include a supplementary document that they may have forgotten.
  • --The regular linguistic team for a client will strive to maintain consistency in terminology as well as style. They often recall problems and solutions that have arisen in the past. For example, if there are two ways to translate a term, the linguist may recall that one version was preferred by the client. This is usually confirmed by use of the translation memory databases, but not always.
  • --With regular translation work from the same client, linguists feel a sense of loyalty to the client, and will often deliver early or find further cost reduction strategies for specific project budget needs.
  • --The medical translation service provider understands the client’s preferred communication method (email, phone) and general availability for a more efficient sharing of information.
Medical translation services are strategic services which require time and effort from both the client and the medical translation service provider to ensure the work is done accurately and efficiently. A partnership developed through a long term relationship promotes smooth running projects, improved cost benefits and delivery over time.

Is a RFI or a RFP Right for my Organization?

Companies are faced with a real challenge when it comes to choosing a translation company. There is no shortage of translation vendors out there and with the explosion of the Internet there is an ever-growing plethora of language providers who all pretend to be “experts” in every imaginable industry. So how do you pick the right translation partner and how do you separate the wheat from the chaff?

If your company is planning on spending upwards of $50,000 per year in translation, sending out an RFI (Request for Information) and an RFP (Request for Proposals) may be the solution. Preparing an RFI/RFP and reviewing responses may be time-consuming but it is an investment that will pay off with time. Again this exercise is recommended for corporations with annual translation budgets of $50,000 per year or more, or with on-going needs.

The RFI will allow you to make a first selection and eliminate a fair amount of translation vendors. The RFI should deal with specific questions that will allow your organization to gage the size, experience, and history of the bidders. Make sure to include a section regarding revenues over the past three years and current year, as well as number of employees, office location(s), how translators are recruited, and certifications such as ISO certificates (this last point is essential). Then, you should ask for specific and documented experience in your field and industry. Ask for names of clients in the industry as well as references. Asking bidders to list specific examples of projects that were translated, as well as the languages and volume (in pages) is also important. This will allow you to separate a language vendor claiming experience in your field because they have translated a three-page brochure two years ago from a translation company that translates hundreds of pages on an ongoing basis. Make sure to also include specific questions about translation tools as well as translation memory technology.

The RFI should allow you to make a first cut and retain 3-4 finalists for the RFP. The RFP should target rates and delivery times. We recommend you use specific examples to compare apples to apples. For instance: a 100-page manual in Adobe InDesign going into French, Italian, German and Spanish, also known as FIGS. Make sure to ask for all-inclusive bids as some translation companies will have hidden costs. Ask the bidders to break down their proposals for translation, typesetting, project management, and any other ancillary costs.


Organizations spending less than $500,000 per year are advised to select one translation company only with one back-up should the winner not meet your expectations. Beyond ½ a million dollars per year, you may need to work with multiple translation vendors.

Remember: Translation is not a commodity. Translation is a service. Do not base your decision on price alone.

Search This Forum

Loading...

Subscribe To This Forum

Enter your email address:

Delivered by FeedBurner

Comment via RSS Feed

Post via RSS Feed

Forum Archive