Pages in topic: [1 2] > | Fixing a tmx to create a Muse - help needed Thread poster: Oliver Pekelharing
|
I have a large TMX of some 65000 entries, generated by Trados. I can import it into Trados (old and new) without errors and also generate an autosuggest dictionary from it. I can also import it into MemoQ without errors but I cannot use it to create a Muse. Olifant also fails to import it fully (xml errors). Anyone recommend a program to repair this tmx (so I can use it to create a Muse)? I'm not savvy enough to repair it manually, even if I could open it. Thanks, Olly | | | Joakim Braun Sweden Local time: 23:34 German to Swedish + ...
Try re-exporting it from Trados. That might generate a correct TMX file. You can open and edit TMX files with any plaintext editor, by the way. Depending on what the XML errors say there might be a simple fix.
[Bearbeitet am 2013-03-18 09:34 GMT] | | | Re: reimport | Mar 18, 2013 |
Have already tried importing and exporting it several times though all the tools at my disposal (Trados, Wordfast, MemoQ, Olifant). The file is to large to be edited with a standard plaintext editor, and anyway I wouldn't know what to edit. Regards, Olly | | | Samuel Murray Netherlands Local time: 23:34 Member (2006) English to Afrikaans + ... Try my TMX fixer script | Mar 18, 2013 |
Olly Pekelharing wrote: Anyone recommend a program to repair this tmx (so I can use it to create a Muse)? I'm not savvy enough to repair it manually, even if I could open it. Try my TMX fixer script: http://leuce.com/autoit/tmxfixerbasic.zip You must have AutoIt installed to use it. A TMX file produced by Trados is likely in UTF16LE format. Let me know if it works for you (or not). | |
|
|
65000 entries is not that big. Notepad++ would almost certainly be able to handle it. Ideally, the error message would tell you what to edit. If all it says is "XML error", then you're out of luck. If it says something like "XML error: tu tag not closed at line XXX" or "XML error: character YYY is not UTF-8 at line ZZZ", you know where to look. If importing to and exporting from Studio doesn't help, you could try the same with apsic xbench. An xbench import-export roundtrip strips ev... See more 65000 entries is not that big. Notepad++ would almost certainly be able to handle it. Ideally, the error message would tell you what to edit. If all it says is "XML error", then you're out of luck. If it says something like "XML error: tu tag not closed at line XXX" or "XML error: character YYY is not UTF-8 at line ZZZ", you know where to look. If importing to and exporting from Studio doesn't help, you could try the same with apsic xbench. An xbench import-export roundtrip strips everything from the TMX except for the text itself, so it has a good chance of fixing the problem. In a similar vein, you could try the TMX_to_tabbed utility in my "grab bag" software package at sourceforge.net/projects/aligner, then use the TMX maker in the aligner package to generate a new tmx. If you upload the TMX to dropbox or rapidshare and post a link here, I'll run this conversion for you and you can see if it fixes the problem. What's a 'muse', by the way? ▲ Collapse | | | Will let you know | Mar 18, 2013 |
Thanks Samuel. Will give it a go and let you know. Olly | | |
Hi Samuel, I ran the silent version (utf16le), which reported no rogues found. The output file is identical in size to the original. After a few minutes I get an Autoit warning: "error allocating memory". Olly
[Edited at 2013-03-18 11:22 GMT] | | | Michael Beijer United Kingdom Local time: 22:34 Member (2009) Dutch to English + ...
My experience is that you can very often fix faulty TMXs by running them through Xbench. Project > Properties > Add > TMX memory then: Tools > Export items Michael | |
|
|
Sorry, I meant 650000 entries. A Muse is an autosuggest function in MemoQ. Olly | | |
It will be a somewhat large chunk for this piece of software, but it will parse the XML and take to to the place where the error is. You may have to parse it several times until the file is free of errors. Regards, Piotr | | | Samuel Murray Netherlands Local time: 23:34 Member (2006) English to Afrikaans + ... Some more notes | Mar 18, 2013 |
Olly Pekelharing wrote: I ran the silent version (utf16le), which reported no rogues found. The output file is identical in size to the original. After a few minutes I get an Autoit warning: "error allocating memory". Okay, I've discovered that the reason for the memory error was because the script was very memory wasteful -- it loaded the TMX file about eight times into the memory when only about twice was really necessary. I did not realise this because my own "large" TMX files were small enough. I'm putting the finishing touches on a new version of the script that will remove TUs that contain invalid characters, as soon as I figure out why the regex won't work, heh-heh. Your TMX file (which you sent to me, thanks) definitely contains invalid XML characters. They are not difficult to locate but it is a cumbersome process if you have to do it one by one. Here's how I find them manually: 1. Try to open the TMX file in Virtaal. Virtaal is very fussy (e.g. if your UTF8 file has a BOM, it will refuse to open it). Virtaal tells you in an error message where the error occurs. The error message given when I try to open your file, is this: "Could not open file. Premature end of data in tag seg line 920042, line 920042, column 233." (I've underlined the important detail). 2. Open the TMX file in Akelpad (a small Unicode editor with very few extra features). In Akelpad, press Ctrl+G (which means "go to"). Type in "920042:233" (which means character 233 of line 920042) and press OK. Akelpad will take the cursor to that position. You won't be able to see what's wrong, because Akelpad doesn't have a glyph for the invalid character (it displays it as a comma, I think). If you want to see what character it is, copy a portion of the text from the cursor position to a new file (created in Akelpad, too), and then open that tiny file in a hex editor (I use Brooks Younce's Tiny Hex Editor). The invalid character at position 920042:233 of your TMX is \x1A. In the hex editor it shows up as "00 1A". In valid XML 1.0 this character must be converted to an entity (but it is easier for me to just delete it and the whole TU that comes with it, if the TMX file is going to be used for reference purposes only). Invalid characters is one reason why a TMX file might fail. Another is if you have a stray greater-than or less-than character somewhere in a segment, or if you have characters that must be written as entities, when there is a stray ampersand, or if you have a missing quote character inside a TMX tag.
[Edited at 2013-03-18 20:11 GMT] | | |
Thanks all, I've tried all your suggestions but to no avail. When I have time I will try Samuel's last suggestion. @Farkas: I also tried your tools and successfully produced a new tmx but this I can't import into MemoQ at all. I don't feel comfortable about posting a link to this client TM here. Regards, Olly | |
|
|
One thing I did notice is that when I was using your tmx maker it suggested that both languages I was using were English (where you enter the language code). I entered the right language codes anyway so I assume this wouldn't have led to the failure of the tmx (in MemoQ) though. Regards, Olly | | | TMX langcodes | Mar 19, 2013 |
The TMX maker has no way to guess what your languages are so it defaults to English. You are supposed to enter the correct language code yourself. If you entered codes that MQ doesn't support (EN instead of EN-GB or whatever) then that might be the reason why the import failed. It could be something else as well, of course. It's impossible to tell for sure without seeing the file. Best of luck fixing this. By the way, have you tried an import-export roundtrip in MemoQ? If this Muse ... See more The TMX maker has no way to guess what your languages are so it defaults to English. You are supposed to enter the correct language code yourself. If you entered codes that MQ doesn't support (EN instead of EN-GB or whatever) then that might be the reason why the import failed. It could be something else as well, of course. It's impossible to tell for sure without seeing the file. Best of luck fixing this. By the way, have you tried an import-export roundtrip in MemoQ? If this Muse functionality is in MemoQ, surely it should accept TMX files that were generated by MemoQ itself??? ▲ Collapse | | |
Yes, I entered the correct codes. As for the round trip, I tried it with studio and the output was error free, but funnily enough when I try it with MemoQ it goes awry (even though MemoQ reports no errors on the import). | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Fixing a tmx to create a Muse - help needed CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |