Translating HTML within an XLF file
Thread poster: NicBathgate
Mar 6, 2013

Hi Proz community,

I am looking for a CAT tool that can effectively segment HTML tags within an XLF file so the content can be translated easily without having to deal with the HTML tags. I will illustrate with screenshots below. HTML content is bulk exported from a very large website in this XLF format so it is not possible to simply save the pages as HTML and translate them that way (or else the content can not be imported back into the website). We need to translate our content u
... See more
Hi Proz community,

I am looking for a CAT tool that can effectively segment HTML tags within an XLF file so the content can be translated easily without having to deal with the HTML tags. I will illustrate with screenshots below. HTML content is bulk exported from a very large website in this XLF format so it is not possible to simply save the pages as HTML and translate them that way (or else the content can not be imported back into the website). We need to translate our content using the XLF files.

So far I have tried Deja Vu, Memsource Cloud, MemoQ, Wordfast Anywhere and SDL Trados Studio 2011 and all seem to have the same behavior as follows.

If I copy the HTML content out of the XLF file into an HTML file with no XLF trans-unit/source/target/group tags and import this into the CAT tool (Trados used for example), everything works as I expect, there are no HTML tags in sight, paragraphs, table cells, list items, headings are all split into their own segments and bold/italic/underlines are handled by Trados WYSIWYG feature. Perfect:
http://postimage.org/image/bvkb0ti5l/full/

The trouble begins when I import the original XLF file that has the HTML content organized in trans-unit tags. Instead of creating segments from the HTML tags, Trados creates segments from the tags and all of the HTML is left there for the translator to painstakingly wade through:
http://postimage.org/image/ah34svm0l/full/

The closest I have come to a solution to this problem is using the "Format > Run Regex Tagger > Filter configuration: Tags and entities" option in MemoQ, but this still doesn't split each HTML tag into a new segment, it just replaces the HTML tags with tag icons.

I have uploaded a copy of the XLF file used in this example in case somebody would like to test it in their own CAT tool, you can download it HERE.

Can anybody suggest a CAT tool or a way I can use one of the CAT tools mentioned above to achieve my goal?

Thank you,
Nic Bathgate
Collapse


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 22:41
English to Polish
+ ...
MemoQ has cascading filters Mar 6, 2013

During import of the file you can apply a cascading filter to hide the markup that you don't want to see.

There is also an option to do that after import, but I forgot its name at the moment.

HTH


Piotr

P.S. I see that you tried memoQ too, but please explain why you want HTML tags as separate segments?


[Edited at 2013-03-06 09:19 GMT]

P.S.2 I see what you mean now. I'll ask in a different forum. Meanwhile, you c
... See more
During import of the file you can apply a cascading filter to hide the markup that you don't want to see.

There is also an option to do that after import, but I forgot its name at the moment.

HTH


Piotr

P.S. I see that you tried memoQ too, but please explain why you want HTML tags as separate segments?


[Edited at 2013-03-06 09:19 GMT]

P.S.2 I see what you mean now. I'll ask in a different forum. Meanwhile, you can always split the large segments manually.

[Edited at 2013-03-06 09:22 GMT]
Collapse


 
RWS Community
RWS Community
United Kingdom
Local time: 22:41
English
Just in case you are interested... Mar 6, 2013

... I did speak to a developer I know who is working on an openexchange application to "beautify" an xliff file like this. We tested the file you provided and it opened like this:


So probably what you're looking for. It's not quite ready for release yet as he's working on a few additional features for it but if you would like to help he'd be very happy to see more sampl
... See more
... I did speak to a developer I know who is working on an openexchange application to "beautify" an xliff file like this. We tested the file you provided and it opened like this:


So probably what you're looking for. It's not quite ready for release yet as he's working on a few additional features for it but if you would like to help he'd be very happy to see more sample files?

Just a thought. Drop me an email if you're interested - [email protected]

Regards

Paul
Collapse


 
István Lengyel
István Lengyel
Hungary
Local time: 22:41
English to Hungarian
+ ...
how you can do this in memoQ now Mar 8, 2013

Hi Nic,

I checked the file in memoQ, and it is possible to define a cascading filter for your file format. A cascading filter is basically a parser for an embedded file format, i.e. HTML in Excel, or regex tagging in PHP, etc. I see that your file is XLIFF, however, nothing is translated, only copied to the target. Is this a priority for you? If not, and you can just take the HTML file, you can create a similar XLIFF very easily. Just click on Import with options, and then select th
... See more
Hi Nic,

I checked the file in memoQ, and it is possible to define a cascading filter for your file format. A cascading filter is basically a parser for an embedded file format, i.e. HTML in Excel, or regex tagging in PHP, etc. I see that your file is XLIFF, however, nothing is translated, only copied to the target. Is this a priority for you? If not, and you can just take the HTML file, you can create a similar XLIFF very easily. Just click on Import with options, and then select the text filter, click Change filter and configuration, and add a cascading filter for HTML. Then it'll appear fine, and you can export into XLIFF.

XLIFF is normally a prepared format, and I see this is coming from Cloudwords. I have no experience with their tool, but I believe that the tagging should be done by the original filter.

It is possible to tag up the XLIFF but for that you need to write some regular expression rules. The way to emulate HTML with regex is described in the memoQ help.

If I can help you more, please don't hesitate to contact us (my address is istvan and then dot and then lengyel at kilgray dot com, I just don't want to receive a hefty amount of spam as penalty for sharing my email address)

István
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translating HTML within an XLF file







Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »