How to handle repetitive texts before OmegaT sent to API machine translation?
Thread poster: reinacher
reinacher
reinacher
Switzerland
May 9, 2021

Dear experts,

I am new to OmegaT. I'd highly appreciate your professional advice, really.
We have e-commerce files, SKUs with similar names, different type numbers
For example:
PLACA ELETRÔNICA PFD94900-2
PLACA ELETRÔNICA OPFI149400
PLACA ELETRÔNICA KLQI 04900-2
PLACA ELETRÔNICA KKLFI 24/36/48
PLACA ELETRÔNICA SD49492000

Problem is that, if we run 10k paragraphs like this via machine translation only 10% will be unique, the
... See more
Dear experts,

I am new to OmegaT. I'd highly appreciate your professional advice, really.
We have e-commerce files, SKUs with similar names, different type numbers
For example:
PLACA ELETRÔNICA PFD94900-2
PLACA ELETRÔNICA OPFI149400
PLACA ELETRÔNICA KLQI 04900-2
PLACA ELETRÔNICA KKLFI 24/36/48
PLACA ELETRÔNICA SD49492000

Problem is that, if we run 10k paragraphs like this via machine translation only 10% will be unique, the rest is just repetitive text.

As a result, all characters translated with the API machine translation will be charged. For the translation of repetitive texts, this is an important factor to take into account in order to avoid translating the same text over and over again and thus save on consumption.

My question is how to configure OmegaT to use databases before asking the MT API to translate any text, so its system first checks whether there are existing translations for the text to be translated. If the answer is "yes", the translation already available is displayed. If the answer is "no", then the text to be translated is sent to the MT API and stored in the database. Any plug-ins, process, custom work? Thank you in advance.
Collapse


 
Susan Welsh
Susan Welsh  Identity Verified
United States
Local time: 03:53
Russian to English
+ ...
I have no idea... May 10, 2021

... but if no one answers here, ask on the sourceforge OmegaT list, which gets more traffic:
https://lists.sourceforge.net/lists/listinfo/omegat-users
[email protected]


 
esperantisto
esperantisto  Identity Verified
Local time: 10:53
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
If I understand you correctly… May 10, 2021

… what you need is to disable Automatically fetch translations in the machine translation options.

Samuel Murray
 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 09:53
Member (2006)
English to Afrikaans
+ ...
Yes May 10, 2021

esperantisto wrote:
Disable Automatically fetch translations in the machine translation options.


Yes, so Options > Machine Translate > Automatically Fetch Translations. You can then fetch translations on demand using Ctrl+M, and then insert the translation by pressing Ctrl+M again.

--

By the way, if you have dozens/hundreds of segments that all start with a certain phrase e.g. "PLACA ELETRÔNICA", you can experiment with adding it to the segmentation rules so that it splits directly after that phrase.

Briefly,
1. Click Options > Segmentation
2. Click the top Add button, and then change "LN-CO" to ".*"
3. Select "New Language and Country" in the top box, and then click the bottom Add button.
4. Put a checkbox in the "Break/Exception" box, put "^PLACA ELETRÔNICA" in the "Pattern before" field, and make the "Pattern after" field empty (or leave it as "\s" or change it to "."). The "^" means "start of line".

See also segmentation setup and regular expressions in the user manual.

Capture-3

[Edited at 2021-05-10 18:27 GMT]

[Edited at 2021-05-10 18:29 GMT]


esperantisto
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


How to handle repetitive texts before OmegaT sent to API machine translation?






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »