Pages in topic:   [1 2] >
Fixing a tmx to create a Muse - help needed
Thread poster: Oliver Pekelharing
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
Mar 18, 2013

I have a large TMX of some 65000 entries, generated by Trados. I can import it into Trados (old and new) without errors and also generate an autosuggest dictionary from it. I can also import it into MemoQ without errors but I cannot use it to create a Muse. Olifant also fails to import it fully (xml errors). Anyone recommend a program to repair this tmx (so I can use it to create a Muse)? I'm not savvy enough to repair it manually, even if I could open it.

Thanks,

Olly


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 23:34
German to Swedish
+ ...
Re-export Mar 18, 2013

Try re-exporting it from Trados. That might generate a correct TMX file.

You can open and edit TMX files with any plaintext editor, by the way.
Depending on what the XML errors say there might be a simple fix.

[Bearbeitet am 2013-03-18 09:34 GMT]


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
Re: reimport Mar 18, 2013

Have already tried importing and exporting it several times though all the tools at my disposal (Trados, Wordfast, MemoQ, Olifant). The file is to large to be edited with a standard plaintext editor, and anyway I wouldn't know what to edit.

Regards,

Olly


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 23:34
Member (2006)
English to Afrikaans
+ ...
Try my TMX fixer script Mar 18, 2013

Olly Pekelharing wrote:
Anyone recommend a program to repair this tmx (so I can use it to create a Muse)? I'm not savvy enough to repair it manually, even if I could open it.


Try my TMX fixer script:
http://leuce.com/autoit/tmxfixerbasic.zip

You must have AutoIt installed to use it. A TMX file produced by Trados is likely in UTF16LE format. Let me know if it works for you (or not).


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 23:34
English to Hungarian
+ ...
Options Mar 18, 2013

65000 entries is not that big. Notepad++ would almost certainly be able to handle it.
Ideally, the error message would tell you what to edit. If all it says is "XML error", then you're out of luck. If it says something like "XML error: tu tag not closed at line XXX" or "XML error: character YYY is not UTF-8 at line ZZZ", you know where to look.
If importing to and exporting from Studio doesn't help, you could try the same with apsic xbench. An xbench import-export roundtrip strips ev
... See more
65000 entries is not that big. Notepad++ would almost certainly be able to handle it.
Ideally, the error message would tell you what to edit. If all it says is "XML error", then you're out of luck. If it says something like "XML error: tu tag not closed at line XXX" or "XML error: character YYY is not UTF-8 at line ZZZ", you know where to look.
If importing to and exporting from Studio doesn't help, you could try the same with apsic xbench. An xbench import-export roundtrip strips everything from the TMX except for the text itself, so it has a good chance of fixing the problem.
In a similar vein, you could try the TMX_to_tabbed utility in my "grab bag" software package at sourceforge.net/projects/aligner, then use the TMX maker in the aligner package to generate a new tmx. If you upload the TMX to dropbox or rapidshare and post a link here, I'll run this conversion for you and you can see if it fixes the problem.

What's a 'muse', by the way?
Collapse


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
Will let you know Mar 18, 2013

Thanks Samuel. Will give it a go and let you know.

Olly


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
tmxfixer Mar 18, 2013

Hi Samuel,

I ran the silent version (utf16le), which reported no rogues found. The output file is identical in size to the original. After a few minutes I get an Autoit warning: "error allocating memory".

Olly

[Edited at 2013-03-18 11:22 GMT]


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 22:34
Member (2009)
Dutch to English
+ ...
Hi Olly, Mar 18, 2013

My experience is that you can very often fix faulty TMXs by running them through Xbench.

Project > Properties > Add > TMX memory

then:

Tools > Export items

Michael


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
@Farkas Mar 18, 2013

Sorry, I meant 650000 entries. A Muse is an autosuggest function in MemoQ.

Olly


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 23:34
English to Polish
+ ...
UltraEdit Mar 18, 2013

It will be a somewhat large chunk for this piece of software, but it will parse the XML and take to to the place where the error is. You may have to parse it several times until the file is free of errors.

Regards,

Piotr


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 23:34
Member (2006)
English to Afrikaans
+ ...
Some more notes Mar 18, 2013

Olly Pekelharing wrote:
I ran the silent version (utf16le), which reported no rogues found. The output file is identical in size to the original. After a few minutes I get an Autoit warning: "error allocating memory".


Okay, I've discovered that the reason for the memory error was because the script was very memory wasteful -- it loaded the TMX file about eight times into the memory when only about twice was really necessary. I did not realise this because my own "large" TMX files were small enough.

I'm putting the finishing touches on a new version of the script that will remove TUs that contain invalid characters, as soon as I figure out why the regex won't work, heh-heh.

Your TMX file (which you sent to me, thanks) definitely contains invalid XML characters. They are not difficult to locate but it is a cumbersome process if you have to do it one by one. Here's how I find them manually:

1. Try to open the TMX file in Virtaal. Virtaal is very fussy (e.g. if your UTF8 file has a BOM, it will refuse to open it). Virtaal tells you in an error message where the error occurs. The error message given when I try to open your file, is this: "Could not open file. Premature end of data in tag seg line 920042, line 920042, column 233." (I've underlined the important detail).

2. Open the TMX file in Akelpad (a small Unicode editor with very few extra features). In Akelpad, press Ctrl+G (which means "go to"). Type in "920042:233" (which means character 233 of line 920042) and press OK. Akelpad will take the cursor to that position.

You won't be able to see what's wrong, because Akelpad doesn't have a glyph for the invalid character (it displays it as a comma, I think). If you want to see what character it is, copy a portion of the text from the cursor position to a new file (created in Akelpad, too), and then open that tiny file in a hex editor (I use Brooks Younce's Tiny Hex Editor).

The invalid character at position 920042:233 of your TMX is \x1A. In the hex editor it shows up as "00 1A". In valid XML 1.0 this character must be converted to an entity (but it is easier for me to just delete it and the whole TU that comes with it, if the TMX file is going to be used for reference purposes only).

Invalid characters is one reason why a TMX file might fail. Another is if you have a stray greater-than or less-than character somewhere in a segment, or if you have characters that must be written as entities, when there is a stray ampersand, or if you have a missing quote character inside a TMX tag.


[Edited at 2013-03-18 20:11 GMT]


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
no fix yet Mar 19, 2013

Thanks all, I've tried all your suggestions but to no avail. When I have time I will try Samuel's last suggestion. @Farkas: I also tried your tools and successfully produced a new tmx but this I can't import into MemoQ at all. I don't feel comfortable about posting a link to this client TM here.

Regards,

Olly


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
@Farkas Mar 19, 2013

One thing I did notice is that when I was using your tmx maker it suggested that both languages I was using were English (where you enter the language code). I entered the right language codes anyway so I assume this wouldn't have led to the failure of the tmx (in MemoQ) though.

Regards,

Olly


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 23:34
English to Hungarian
+ ...
TMX langcodes Mar 19, 2013

The TMX maker has no way to guess what your languages are so it defaults to English. You are supposed to enter the correct language code yourself. If you entered codes that MQ doesn't support (EN instead of EN-GB or whatever) then that might be the reason why the import failed. It could be something else as well, of course. It's impossible to tell for sure without seeing the file. Best of luck fixing this.

By the way, have you tried an import-export roundtrip in MemoQ? If this Muse
... See more
The TMX maker has no way to guess what your languages are so it defaults to English. You are supposed to enter the correct language code yourself. If you entered codes that MQ doesn't support (EN instead of EN-GB or whatever) then that might be the reason why the import failed. It could be something else as well, of course. It's impossible to tell for sure without seeing the file. Best of luck fixing this.

By the way, have you tried an import-export roundtrip in MemoQ? If this Muse functionality is in MemoQ, surely it should accept TMX files that were generated by MemoQ itself???
Collapse


 
Oliver Pekelharing
Oliver Pekelharing  Identity Verified
Netherlands
Local time: 23:34
Dutch to English
TOPIC STARTER
@Farkas Mar 19, 2013

Yes, I entered the correct codes. As for the round trip, I tried it with studio and the output was error free, but funnily enough when I try it with MemoQ it goes awry (even though MemoQ reports no errors on the import).

 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Fixing a tmx to create a Muse - help needed







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »