CPD for translators: reading foreign language newspapers

I've written a few blog posts about CPD, or Continuing Professional Development (here and here, if you're interested). As a Member of the Chartered Institute of Linguists, I am expected to keep a regular record of any CPD activities I carry out. As I have mentioned in previous posts, CPD can be any number of things. One activity that constitutes CPD is reading foreign language newspapers, and this is something I am able to do regularly thanks to the good old World Wide Web.

As all freelance translators out there will know, it's pretty much impossible to have any kind of routine to your day. Translation work arrives in ebbs and flows. It's impossible to plan. Don't get me wrong, that's one of the things I love the most about being a freelance translator, that there is no routine to the day. Every day is different. Things definitely do not get boring or monotonous. It keeps me on my toes! Sometimes, however, it's nice to carry out some kind of routine activity, even if in the form of something seemingly small. That's why most mornings you will find me in my office, in front of the computer with a big (ok, huge!) cup of coffee and my breakfast, reading online foreign language newspapers. A bit like this...

It's a way for me to build some kind of routine into my day, and best of all it gets me into the swing of my source languages, ready for a long day of translation. Most importantly, however, it's a form of CPD. And by making it part of my morning routine, I know I'm regularly adding to my list of CPD activities. There are, of course, all kinds of sources of online reading, but here are my favourite online newspapers for each of my source languages:




Which source language online newspapers do you read? Any favourites? Feel free to add links to your favourites in the comments section below. I'd love to hear from you.

Yours truly,
Lingo Woman

Lingo Woman's useful resources for freelance translators #3: TERMIUM Plus®

What is TERMIUM Plus®?

TERMIUM Plus® is the Government of Canada's terminology and linguistic data bank. It is one of the largest linguistic data banks in the world and is a product of over 30 years of research and development. It is regularly updated by 40 terminologists, who keep it current with the latest information.

So what makes it particularly useful for translators?

  • It gives the precise equivalents of terms from a wide range of fields in four languages, English, French, Spanish and Portuguese.
  • It contains almost 4 million terms, names, official titles, names of national and international organisations, statutes and programmes, as well as abbreviations, acronyms and geographical names, together with definitions, contexts and examples of usage. As you can imagine, it's pretty extensive!
  • It contains highly specialised terms not found in standard bilingual data banks.
  • It can also be used in single-language format to find the meaning of a specialised term.
  • Terms are organised by subject, making it easy to find context-appropriate equivalents.
  • You can search for single terms or entire phrases.
  • You can search in all four available languages:  English, French, Spanish or Portuguese.

It's an excellent resource. Go check it out!

Yours truly,
Lingo Woman

OCR. What's that then?

What is OCR?

OCR stands for Optical Character Recognition. OCR tools extract text from different types of documents, such as PDF files, images or scanned paper documents, and convert it into editable content.

Why would a translator need to use OCR?

Imagine you receive a PDF file sent by email for translation. Although it looks like a normal PDF, upon closer inspection you realise it's nothing more than an image saved as a PDF. Not with me? Let me explain. There are two main types of PDF file: native PDFs and scanned (image) PDFs. Native PDFs are created using a computer application. Text from such PDFs can be cut and pasted into different file formats. In short, they're a doddle to work with as far as translators are concerned. Scanned PDFs, not so much. 

So how can you tell if you're dealing with a scanned (image) PDF?

The tell tale characteristic is that you cannot select any of the text. This means the text cannot be interpreted by a text editing program, and therefore means big problems for translators! This, my friends, is where OCR tools come into play. OCR software will extract and interpret the text from an image PDF and convert it into words and sentences, enabling you to access and edit (and therefore translate!) the content of the original document.

OK, so I need OCR software. What do you recommend?

There are many tools available. It's really a matter of preference so your best bet is to do some research based on your particular needs. I use a combination of Able2Extract Professional and ABBYY FineReader Online and they work really well for me. Check out their websites for more information, FAQs and explanations of the process. Why do I use both? There's no real reasoning behind it except that I purchased Able2Extract some time ago before I discovered ABBYY FineReader. Generally, it serves my needs well, but very occasionally it falls short. When this is the case, I resort to ABBYY FineReader Online. The thing I like about ABBYY FineReader Online is that it works on a 'pay-as-you-go' basis. All you need to do is register and buy credits. You pay for what you need without having to shell out on expensive software. I also like that it support 42 recognition languages, which is ideal for translators. Plus, all material uploaded is kept strictly confidential.

What kind of accuracy can I expect? Can I feed the converted file straight into my CAT tool of choice?

From my experience, both the tools I use are very accurate at recognising text, particularly ABBYY FineReader when it comes to foreign language texts. Errors are, however, inevitable. My advice when working with OCR texts, is to have a copy of the original document in front of you for reference, either as a print out or on a second screen.

One issue to bear in mind is formatting. While converting image PDFs, many OCR tools preserve the layout and formatting of the original, recreating native Microsoft Word formatting such as headings, tables of contents, headers and footers, footnotes, page numbering and font styles. This may save you a great deal of time when it comes to making sure the formatting of your target file matches the source file, but beware! Although the resulting file may look good, when you begin translation work, any number of issues may arise. Text boxes, hyphenation, column and section breaks and hidden tags can create all kinds of problems and prevent matching with TMs or glossaries. For these reasons, it is essential that you post-process files in order to avoid problems. I'll cover that in another post!

So, fellow translators, what OCR tools do you use? Do you have a favourite? Does anyone else prefer the ABBYY Online 'pay-as-you-go' system like me or do you prefer just to buy the software? I'd love to hear from you!

Yours truly,
Lingo Woman