Converting 2-column list to glossary
Thread poster: Tony M
Tony M
Tony M
France
Local time: 12:51
Member
French to English
+ ...
SITE LOCALIZER
Mar 10, 2016

I have a two-part problem:

1) I often receive 2 column aligned SOURCE / TARGET lists of terms — basically, a client glossary.
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)? As far as I have been able to ascertain, my CAT tool doesn't have a built-in utility for doing this (though I may be wrong!)

I have a manual workaround which in
... See more
I have a two-part problem:

1) I often receive 2 column aligned SOURCE / TARGET lists of terms — basically, a client glossary.
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)? As far as I have been able to ascertain, my CAT tool doesn't have a built-in utility for doing this (though I may be wrong!)

I have a manual workaround which involves looking at an existing glossary — which is basically a tab-separated text file — and counting the number of tabs (many of which only separate blank fields that I don't need to use. I then add enough blank columns to the right of my bilingual table to create the corresponding number of tabs, convert table-to-text, and then to be on the safe side, copy and paste that text into an existing blank glossary. But it's a bit long-winded, and a little routine for doing it would certainly help!

2) I currently have a particular glossary with a slightly different format — it has 2 lines for each entry, the first being the acronym and its translation, and then the second being the expanded form of the acronym and its translation. What I need to do is get the two acronyms in to the Source and Target fields of my glossary, and then the 2 expanded forms together into the 'Notes' field; anyone got any brilliant ideas how to do this? i suspect I am going to have to first manually combing the expanded text + translation into a 3rd column alongside the acronyms, and then proceed with my original system as above, albeit with one less 'extra' column.

All suggestions gratefully received!
Collapse


 
Patrick Porter
Patrick Porter
United States
Local time: 07:51
Spanish to English
+ ...
Regular expression find and replace could work Mar 11, 2016

For your second issue...if you have a text editor that allows find/replace with regex...you could use that to find every pair of lines and then take out the line ending in the middle.

If you have Notepad++ (or care to download..it's free)...open the file....press Ctrl+H (for find/replace) and put the following expressions in the corresponding boxes:

Find: (.+?)\t(.+?)\r\n(.+?)\t(.+?)\r\n
Replace: $1\t$2\t$3\t$4\r\n


Make sure to check at the bot
... See more
For your second issue...if you have a text editor that allows find/replace with regex...you could use that to find every pair of lines and then take out the line ending in the middle.

If you have Notepad++ (or care to download..it's free)...open the file....press Ctrl+H (for find/replace) and put the following expressions in the corresponding boxes:

Find: (.+?)\t(.+?)\r\n(.+?)\t(.+?)\r\n
Replace: $1\t$2\t$3\t$4\r\n


Make sure to check at the bottom: Search Mode: Regular Expression....and check the box "matches newline"

This will make every second line appear as fields 3 and 4 of the previous line. Make sure that there is a carriage return after the last line (i.e. the file doesn't end at the very end of the last line but at the beginning of a new line.) Also, if you are not on a Windows machine or the file was not created on a Windows machine, then the newline might be \n instead of the \r\n in the expressions above. You can tell by trying a quick find and if nothing comes up then try removing all the \r from the regexes (there are 3 total).
Collapse


 
CafeTran Training (X)
CafeTran Training (X)
Netherlands
Local time: 12:51
Try another CAT tool? Mar 11, 2016

Tony M wrote:

but the actual tool isn't really the issue


When you're willing to try another tool: CafeTran's native glossary format is tab-delimited: you can use your list right away.

It also allows source-side and target-side alternatives:

ACRONYM;long form source TAB ACRONYM;long form target

That way, you can keep together what belongs together. During translation, you can easily switch between automatic insertion of the alternative target via the right mouse button: so you can choose to have the acronym translated as acronym or as a long form. Once or in the whole project.

A similar regex would be needed to prep the glossary.

I've recorded a short video to demonstrate this: https://youtu.be/roX4yksMssk

[Edited at 2016-03-11 07:21 GMT]


 
Philippe Etienne
Philippe Etienne  Identity Verified
Spain
Local time: 12:51
Member
English to French
Glossary search without CAT tools Mar 11, 2016

Not sure how helpful it may be to your issue, but I've been using Search and Replace from Funduc (http://www.funduc.com/) since I started.
When I receive 2+-column glossaries, I adapt/convert them to .csv or .txt and the app searches all files, the results window shows all occurrences line by line.
It has many other features that I've never used, but to search quickly many heterogenous files at once w
... See more
Not sure how helpful it may be to your issue, but I've been using Search and Replace from Funduc (http://www.funduc.com/) since I started.
When I receive 2+-column glossaries, I adapt/convert them to .csv or .txt and the app searches all files, the results window shows all occurrences line by line.
It has many other features that I've never used, but to search quickly many heterogenous files at once without opening them, it's handy.

If I remember well, incorporating 2-column glossaries into MemoQ is also quite easy.

Philippe
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 12:51
Member (2006)
English to Afrikaans
+ ...
Two columns is enough for WFC Mar 11, 2016

Tony M wrote:
Does anyone know of a convenient utility that can be used to convert this into a CAT tool glossary (in my case, for Wordfast Classic, but the actual tool isn't really the issue)?


WFC does not care if different records have different numbers of fields, as long as the two required fields (source and target) are present. So you can safely add more terms to the WFC glossary, even if the terms that you add has only source and target, whereas the other entries have more tabs.

I currently have a particular glossary with a slightly different format — it has 2 lines for each entry, the first being the acronym and its translation, and then the second being the expanded form of the acronym and its translation. What I need to do is get the two acronyms in to the Source and Target fields of my glossary, and then the 2 expanded forms together into the 'Notes' field...


I see a long road of manual copying ahead.

Samuel


 
esperantisto
esperantisto  Identity Verified
Local time: 14:51
Member (2006)
English to Russian
+ ...
SITE LOCALIZER
No need in any tool Mar 11, 2016

You don’t need any tool. As mentioned, some CAT programs can use tab-delimited text files as glossaries directly (just to add: OmegaT, Anaphraseus), others can import them by an established procedure using a built-in tool or feature. Just read the respective manual.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting 2-column list to glossary







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »