Software that lets you join TUs at a given character?
Thread poster: Hans Lenting
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Feb 14, 2021

Is there a tool (TMX editor, CAT tool) that lets you join segments at a given character (e.g. $), so that I can fix a TMX that has been split incorrectly at abbreviations?

This is an ex$
of a feature.

Dies ist ein Beispiel
einer Funktion.

Becomes:

This is an ex$ of a feature.

Dies ist ein Beispiel einer Funktion.


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 19:52
Member (2006)
English to Afrikaans
+ ...
@Hans Feb 14, 2021

Hans Lenting wrote:
This is an ex$
of a feature.


Just how certain are you that all instances that end on $ must be joined with the next segment? And, how sure are you that the TM hasn't been sorted (e.g. alphabetically) in the mean time? And... what OS are you on?


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Flexible Feb 14, 2021

Samuel Murray wrote:

Hans Lenting wrote:
This is an ex$
of a feature.


Just how certain are you that all instances that end on $ must be joined with the next segment? And, how sure are you that the TM hasn't been sorted (e.g. alphabetically) in the mean time? And... what OS are you on?


I replace the full stop after abbreviations with a dollar sign. So I specify exactly where the joining has to take place.

No sorting.

Mac and Win

I feel an AutoIt macro coming...


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:52
Member (2014)
Japanese to English
Search and replace Feb 14, 2021

Hans Lenting wrote:
I feel an AutoIt macro coming...

If you're dealing with XML files (basically) wouldn't some kind of grep utility be both quicker and more reliable? I guess the issue is ensuring that you only replace text in segments, so you'd need to be quite confident that you understood the structure of the file... Perhaps an Xpath editor would be useful.

Dan


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:52
English to Russian
With Heartsome TMX Editor Feb 14, 2021

you can convert your tmx file to MS Word as a simple table, then join cells where necessary, and convert the edited file back to tmx.
Also I have an AHK script to find $ and merge relevant source and target cells. However you mentioned that you use AutoIt.

[Edited at 2021-02-15 00:43 GMT]


Grigori Gazarian
Hans Lenting
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Via Excel Feb 15, 2021

Here is how I am planning to solve this task:

  • At the source side, I append a unique sign (e.g. ¶) to all abbreviations that caused incorrect segmentation.
  • Then I export the project to a table for external review.
  • I paste the content of the table to a spreadsheet in Excel.
  • I insert a simple formula in every cell of the B and E column, causing insertion of the second parts of the incorrectly truncated segments.
  • I copy everything to... See more
Here is how I am planning to solve this task:

  • At the source side, I append a unique sign (e.g. ¶) to all abbreviations that caused incorrect segmentation.
  • Then I export the project to a table for external review.
  • I paste the content of the table to a spreadsheet in Excel.
  • I insert a simple formula in every cell of the B and E column, causing insertion of the second parts of the incorrectly truncated segments.
  • I copy everything to a text editor and filter on all lines containing a ¶.
  • I make the necessary replacements to get the correctly segmented, 2-column version.


1.
1

2.
2

3.
3
Collapse


 
Dan Lucas
Dan Lucas  Identity Verified
United Kingdom
Local time: 18:52
Member (2014)
Japanese to English
Sounds like a plan Feb 15, 2021

Hans Lenting wrote:
Here is how I am planning to solve this task:
1.
1

Looks sort of reasonable. Won't that leave you with "extra" rows (like row 2 in the above) to deal with? I guess you just filter in Excel to show rows equal to "0" in column B, and delete them.

Also, you want something absolutely unique for your symbol/s - maybe something like "^^^" - to be sure. I doubt you'd find a pilcrow symbol in the text (as opposed to an actual new line) but I can't help feeling you'd be tempting fate by using something that could possibly be misinterpreted!

Regards,
Dan


Hans Lenting
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
BBEdit Feb 15, 2021

Dan Lucas wrote:

Hans Lenting wrote:
Here is how I am planning to solve this task:
1.
1

Looks sort of reasonable. Won't that leave you with "extra" rows (like row 2 in the above) to deal with? I guess you just filter in Excel to show rows equal to "0" in column B, and delete them.


I'll copy everything to BBEdit and I'll use the feature Process lines containing ... there.

Screenshot 2021-02-15 at 08.52.27


Dan Lucas
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 21:52
English to Russian
Heartsome again Feb 15, 2021

I forgot to mention that Heartsome can also convert tmx files to Excel spreadsheets and backwards.
Just in case...

[Edited at 2021-02-15 11:47 GMT]


Hans Lenting
 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
TOPIC STARTER
Another approach in Excel Feb 15, 2021

With these two formulas, you can merge cells that contain a pilcrow:

Source segment:

=IF(ISNUMBER(SEARCH("¶";A1));(LEFT(A1;LEN(A1)-1))&" "&A2;"")

Target segment:

=IF(ISNUMBER(SEARCH("¶";A1));B1&" "&B2;"")

Screenshot 2021-02-15 at 17.44.35

Note that in order to make this work, you'll have to replace TAB characters inside segments with placeholders (e.g. ¬). (You can put the TAB characters back afterwards.)

Propagating the formulas downwards in Excel:

If the formula is in the first cell of a column:

  • Select the entire column by clicking the column header or selecting any cell in the column and pressing CTRL+SPACE.
  • Fill down by pressing CTRL+D.



[Edited at 2021-02-16 13:05 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Software that lets you join TUs at a given character?







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »