Pages in topic:   < [1 2]
DGT translation memories
Thread poster: Dominique Pivard
trhanslator (X)
trhanslator (X)
CafeTran's approach is completely different Feb 9, 2013

Michael Beijer wrote:

And how about the amount of TMs that CafeTran can access in a project simultaneously? In memoQ I have around 8,000,000 segments across all of my connected TMs and experience no slowdowns. How does this work in CT?



Completely different – and even as an experienced CafeTran user it took me long to understand the procedure.

CafeTran loads TMX files in a DB (e.g. the built-in H2 DB) very fast. 2 million TUs from the DGT DE-NL in about three minutes on a MacBook Pro with 8 GB.

After this step you have to dump the DB to RAM (another couple of minutes) and then start pre-translating (a process that runs in the background). All Exact Matches will be inserted.

All other matches can be found via the concordance function that is also very fast (instantaneous). I hope I've described everything correctly here. If not, please correct me.


 
Huw Watkins
Huw Watkins  Identity Verified
United Kingdom
Local time: 15:37
Member (2005)
Italian to English
+ ...
Worked Fine with SDL Trados Studio 2011 Aug 20, 2013

My laptop is not the fastest and extracting the ES>EN tmx took about 4 hours and the FR>EN took even longer - but I think that is because the laptop was starting to get all hot and bothered under the collar by the second extract... Curiously the FR>EN combination had fewer segments, about 100,000 fewer, which surprised me given that France was one of the founder members of the EU and that it is also one of the official Languages of Luxembourg, Belgium and so on. There were about 1,900,000/1,800,... See more
My laptop is not the fastest and extracting the ES>EN tmx took about 4 hours and the FR>EN took even longer - but I think that is because the laptop was starting to get all hot and bothered under the collar by the second extract... Curiously the FR>EN combination had fewer segments, about 100,000 fewer, which surprised me given that France was one of the founder members of the EU and that it is also one of the official Languages of Luxembourg, Belgium and so on. There were about 1,900,000/1,800,000 units respectively.

Importing the tmx into a TM was a similar story - although I've only done the FR>EN thus far - it really took several hours.

Another thing I did was to use a more powerful laptop and generate an Autosuggest dictionary including the entire TM with all the translation units included i.e. no limits to the translation unit count, which, in the case of FR>EN means 1,200,000 TM units included in the autosuggest creation process.

I have only done this for FR>EN thus far, however if anyone would like a copy of this autosuggest dictionary, please contact me and we'll find a way to transfer it - it is only 200 megs. The TM is a different story at 1.5 gigs. Perhaps I could even share it from my MS skydrive cloud?

I shall also be doing this for the ES>EN, PT>EN and IT>EN language pairs when I have time.

I would suggest that perhaps we all share the DGT translation memory autosuggest dictionaries/TMs with one another to save a bit of work?

[Edited at 2013-08-20 12:33 GMT]
Collapse


 
Meta Arkadia
Meta Arkadia
Local time: 22:37
English to Indonesian
+ ...
Inefficient Aug 21, 2013

xxxtrhanslator wrote:
I hope I've described everything correctly here. If not, please correct me.

That procedure is not wrong, but highly inefficient, and you'll miss out on the concordance search during pretranslation.

The correct procedure:

 Open the DGT file (in my case more than 2 million segments) as a regular TMX file with the settings Read Only, Pretranslate, Fuzzy. This will take a bit more than a minute, depending on your hardware (esp. the RAM settings for JRE)
 Start the pretranslation. This can take seconds up to half an hour, possibly longer, depending on the size of the source file. You can start translating immediately, no need to wait for the pretranslation.
 During pretranslation, the DGT cannot be searched. That's where your external DB comes in. If needed, load the very same DGT file that's being processed in the pretranslation in the external memory to be able to search it. This uses very little RAM. Concordance search will be reasonably fast (a few seconds at most*), but slower than searching the DGT that's loaded in the RAM (immediately available after the pretranslation is ready)

*Unless you're as stupid as I was once, searching for one letter. I hit Return before I noticed it. CT hung. I later repeated the process on purpose, only to find out the obvious, you can't search for short words that occur very often ("the"), or even long words if they occur often enough ("Verordnung")

Cheers,

Hans

[Edited at 2013-08-21 02:18 GMT]


 
Pages in topic:   < [1 2]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

DGT translation memories







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »