Pages in topic:   < [1 2 3 4 5 6 7 8 9] >
New free & open source aligner (for Windows, OS X and linux)
Thread poster: FarkasAndras

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
What do you mean? Dec 9, 2010

Marta Hasyuk wrote:

Is there any possibility to reiterate alignment for the same texts?

Run the aligner with the same input files again...?

Marta Hasyuk wrote:
How to set realign option in the algorithm?

Do you mean the -realign switch in hunalign to improve the quality of the alignment by doing two passes? You can add that to the hunalign command in the .pl easily if you're on linux or OS X. On windows, see the previous post on how to run the .pl instead of the .exe.


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
3- and 4-language versions added Dec 22, 2010

I thought I'd let you all know that the 3-language and 4-language versions of the aligner are also available now.
If you're an interpreter with more than one passive language or a PM who needs to align files for multilingual projects, this should come in pretty handy. It generates xls files and multilingual TMXes.
You can autoalign texts in 4 languages in one go, correct the alignment also in one go, generate a TMX and send it to everyone involved in the project. Then everyone can im
... See more
I thought I'd let you all know that the 3-language and 4-language versions of the aligner are also available now.
If you're an interpreter with more than one passive language or a PM who needs to align files for multilingual projects, this should come in pretty handy. It generates xls files and multilingual TMXes.
You can autoalign texts in 4 languages in one go, correct the alignment also in one go, generate a TMX and send it to everyone involved in the project. Then everyone can import the same TMX into their own TMs, and their CAT should know which languages to import and which to ignore.
Feedback and bugreports welcome as always, the download URL is the same as ever.

Windows only at the moment, mac/linux coming soonish.
Of course there were other updates in all versions since I last posted here, so if you're a user, get downloading. Mac and linux versions are at 2.302 now.
Collapse


 

Charles Ek  Identity Verified
미국
Local time: 05:41
Member (2009)
Norwegian to English
+ ...
Thanks very much for this excellent software Jun 26, 2011

As I've told you privately, this is an excellent piece of software. I'm unschooled in non-GUI commands. However, with the aid of your clear documentation I was able to install this tool in minutes and then immediately align a reference translation and its source. The resulting TMX ran flawlessly in OmegaT. Thanks!

 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
New version Jun 26, 2011

Charles Ek wrote:

As I've told you privately, this is an excellent piece of software. I'm unschooled in non-GUI commands. However, with the aid of your clear documentation I was able to install this tool in minutes and then immediately align a reference translation and its source. The resulting TMX ran flawlessly in OmegaT. Thanks!


Thanks, glad you like it. Incidentally, version 2.55 was released a few days ago. The main new feature is that the main script can now do multilingual alignments (up to 100 languages). This also means that the 3- and 4-language versions, which were always lagging behind in development/release, aren't needed anymore.


 

ni-cole  Identity Verified
스위스
Local time: 12:41
German to French
+ ...
Filenames Feb 11, 2012

Dear FarkasAndras

Thank you very much for this wonderful tool! I think it is very useful.

But...

I had first some problem trying lf aligner: when I opened the source file (doc) and the target file, lf aligner disappeared...! I read in this post that the reason for closing was the name of the file. My filenames look like this: name_D for the source document and name_F for the target document (while name = the name given by my customer, D = german, F = frenc
... See more
Dear FarkasAndras

Thank you very much for this wonderful tool! I think it is very useful.

But...

I had first some problem trying lf aligner: when I opened the source file (doc) and the target file, lf aligner disappeared...! I read in this post that the reason for closing was the name of the file. My filenames look like this: name_D for the source document and name_F for the target document (while name = the name given by my customer, D = german, F = french). I changed it by test and essai and then it works.

But I cannot change my whole system just because of one tool, even if this tool is great!

So my question is: is there any possibility to make lf aligner understand that there is a difference between name_D and name_F? I mean, all other softwares know it...

By the way: are letters like ä ö or ü in the filename also a problem?

Kind regards
Nicole.
Collapse


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
Filenames Feb 12, 2012

Well, the file name issue is the following: due to some issues outside of my control, filenames with non-ASCII characters don't work on Windows. It's a character encoding problem in Windows and the programming language I used for the project. It may be possible for me to find a workaround to the problem, but it wouldn't be easy or simple. I decided not to bother and spend my time working on what I consider real features. But now that you mention it, maybe I'll try and add better error handling o... See more
Well, the file name issue is the following: due to some issues outside of my control, filenames with non-ASCII characters don't work on Windows. It's a character encoding problem in Windows and the programming language I used for the project. It may be possible for me to find a workaround to the problem, but it wouldn't be easy or simple. I decided not to bother and spend my time working on what I consider real features. But now that you mention it, maybe I'll try and add better error handling or possibly rename offending files automatically (I don't have high hopes of the latter working out).

So yes, the problem is "letters like ä ö or ü in the filename". Anything else is fine, pretty much.
You can call your files "File_12_of_Client_X version 12.3.4_new.doc", (i.e. underscores, spaces and full stops are not a problem).
You can't call the files "jökkmokk.doc" or "ű.txt". Obviously, you can't use Asian characters, either, or a French c with a cédille.

By the way, avoiding non-ASCII characters in filenames is good practice in general. They can cause problems left and right. For instance, if several files are attached to an email in Yahoo and you choose "download all", then Yahoo zips them for you. Non-ASCII characters in the filenames will be corrupted because of a similar character encoding problem to the one in LF Aligner. I'm sure you've seen character corruption in file names... that's always due to something like this.

[Edited at 2012-02-12 09:35 GMT]
Collapse


 

ni-cole  Identity Verified
스위스
Local time: 12:41
German to French
+ ...
Filenames Feb 12, 2012

Thank you very much for answering so fast!

The problem was not only the ü, this was actually the first problem and it is actually easy to resolve. And as you said it may be anyway better not to use ü and similars in filenames.

But my main problem is that lf aligner closes after I open the files because it considers they have the same name - at least this is what I understood in this topic. The source file is called uebersetzung_D and the target file uebersetzung_F. I
... See more
Thank you very much for answering so fast!

The problem was not only the ü, this was actually the first problem and it is actually easy to resolve. And as you said it may be anyway better not to use ü and similars in filenames.

But my main problem is that lf aligner closes after I open the files because it considers they have the same name - at least this is what I understood in this topic. The source file is called uebersetzung_D and the target file uebersetzung_F. I always keep the original filename, just adding a D for german and a F for french at the end. Obviously lf aligner consider then as identical but they aren't.

Is there a solution for this (except renaming all the files)?

In between I may found a solution, but it is not ideal: I will create a special folder called "lf aligner-files" and copy the two files inside, then rename then in german.doc and french.doc and run lf aligner. I think it will work, but it makes it less easy to use and I am actually looking for something that I can do within my daily business, even if there is some work-pressure.

By now, I am using Plus Tools and I am loosing so much time because it makes a big mess, putting german segments in the french part and vice-versa, etc. So I often do not align just because I am afraid to loose so much time on it. I also tried bitext2tmx but was not convinced.

Kind regards,
Nicole.
Collapse


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
No Feb 12, 2012

ni-cole wrote:

my main problem is that lf aligner closes after I open the files because it considers they have the same name - at least this is what I understood in this topic. The source file is called uebersetzung_D and the target file uebersetzung_F. I always keep the original filename, just adding a D for german and a F for french at the end. Obviously lf aligner consider then as identical but they aren't.

No, it doesn't consider them identical. The problem is something else.
The two input files have to be in the same folder - I can't think of any other limitation.
To find the root cause, read aligner/scripts/log.txt after a failed alignment and post it here if you can't figure out what went wrong.
Also, if the console window just closes on you before you have a chance to read the error message, open a persistent console window by typing cmd into the search window in the start menu (win7) or pressing "Run" in the Start menu and typing cmd there (XP).
Then just drag and drop the aligner exe into the console window and press enter to launch it. This way the window won't disappear.

ni-cole wrote:
By now, I am using Plus Tools and I am loosing so much time because it makes a big mess, putting german segments in the french part and vice-versa, etc. So I often do not align just because I am afraid to loose so much time on it. I also tried bitext2tmx but was not convinced.

Neither tool has an autoaligner, so I don't consider them real full-featured aligners. However, they both have a better editing UI then LF Aligner (which just uses Excel/OOo Calc for this purpose). The readme tells you how to use the PlusTools UI for alignments done with LF Aligner.


 

ni-cole  Identity Verified
스위스
Local time: 12:41
German to French
+ ...
Log.txt -> it seams to be a folderproblem...?!? Feb 17, 2012

Hi!

Sorry, I had some busy work in the last days so I hadn't time to try it again.

I made like you said and this is the log:

Program: LF aligner, version: 2.56, OS: Windows, launched: 2012/02/17, 11:58:19

Setup: filetype_def: t; filetype_prompt: y; l1_def: en; l2_def: hu; l1_prompt: y; l2_prompt: y; segmenttext_def: y; segmenttext_prompt: y=; cleanup_def: y; cleanup_prompt: y; review_def: x; review_prompt: y; create_tmx_def: y; create_tmx_promp
... See more
Hi!

Sorry, I had some busy work in the last days so I hadn't time to try it again.

I made like you said and this is the log:

Program: LF aligner, version: 2.56, OS: Windows, launched: 2012/02/17, 11:58:19

Setup: filetype_def: t; filetype_prompt: y; l1_def: en; l2_def: hu; l1_prompt: y; l2_prompt: y; segmenttext_def: y; segmenttext_prompt: y=; cleanup_def: y; cleanup_prompt: y; review_def: x; review_prompt: y; create_tmx_def: y; create_tmx_prompt: y; l1_code_def: EN-GB; l2_code_def: HU; l1_code_prompt: y; l2_code_prompt: y; creationdate_prompt: y; creationid_def: ; creationid_prompt: y; ask_master_TM: n; chopmode: 0; tmxnote_def: ; tmxnote_prompt: y; pdfmode: y

GUI on
filetype: t
Input file 1: Compendium_D.doc (C:/Users/[myname]/Documents/übersetzungen/[client]/hilfe/Compendium_D.doc)
Input file 2: Compendium_F.doc (C:/Users/[myname]/Documents/übersetzungen/[client]/hilfe/Compendium_F.doc)
ERROR: File 1 not found; folder: C:/Users/[myname]/Documents/übersetzungen/[client]/hilfe, file: Compendium_D.doc
ERROR: File 2 not found; folder:C:/Users/[myname]/Documents/übersetzungen/[client]/hilfe, file: Compendium_F.doc

((I just change the name of the client in the log here))

Then I did copy the two documents on the Desktop and did the same and... it works! So it seams to be a folder problem...?

Why do I have to put then on the Desktop? You said it is important that they are in the same folder and they were.

Any idea?

By the way: I am again very impressed by the result, thank you very much!
Collapse


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
Folder name Feb 17, 2012

ni-cole wrote:

ERROR: File 1 not found; folder: C:/Users/[myname]/Documents/übersetzungen/[client]/hilfe, file: Compendium_D.doc
ERROR: File 2 not found; folder:C:/Users/[myname]/Documents/übersetzungen/[client]/hilfe, file: Compendium_F.doc


There's your answer. The same limitations apply to folder names as file names (no non-ASCII characters allowed).
So, just rename "übersetzungen" to "ubersetzungen" and you should be good to go.
Or, of course, put the files in any other location that has no accented letters anywhere in the path name.


 

ni-cole  Identity Verified
스위스
Local time: 12:41
German to French
+ ...
Thank you! Feb 19, 2012

Of course, you are right. I didn't realise that I have also to watch the name of the folders... Shame in me



FarkasAndras wrote:

So, just rename "übersetzungen" to "ubersetzungen" and you should be good to go.
Or, of course, put the files in any other location that has no accented letters anywhere in the path name.


I'll do that.

And thank you very much for your help!


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
Beta testers wanted Mar 30, 2012

A pretty major update to LF Aligner is almost ready for release, and I'd like to have a few people run it through its paces before it goes live.
The update adds a graphical user interface to the aligner, hopefully making the tool a lot more user friendly.

So, if you're interested and good with computers, let me know. You don't need to be a programmer to beta test, but you need to be able to give me detailed bugreports & feature requests, make sense of log files etc. I'll need
... See more
A pretty major update to LF Aligner is almost ready for release, and I'd like to have a few people run it through its paces before it goes live.
The update adds a graphical user interface to the aligner, hopefully making the tool a lot more user friendly.

So, if you're interested and good with computers, let me know. You don't need to be a programmer to beta test, but you need to be able to give me detailed bugreports & feature requests, make sense of log files etc. I'll need you to run the program with a variety of realistic usage scenarios with your own texts, make sure everyting works and make suggestions for reshuffling the GUI or adding stuff etc. In return, you get early access and... updates from sourceforge later like everyone else.
I'm mainly looking for WinXP, Vista and Win7 users. If you'd like to try out the GUI on Linux or OSX, you'll need to install a perl module or two on your own, so more expertise is needed on these platforms.

Send an email to lfaligner (gmail) to get in on the action.
Collapse


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
Still looking Apr 8, 2012

Still looking for beta testers.

 

KylaR
Local time: 12:41
How does the batch aligner work ? Feb 10, 2013

Hi Farkas,

First of, thanks for building LF Aligner ! I discovered it a few weeks ago only, but I love it !

I have questions about the batch mode, though.

Say, I want to create my own personal bilingual editions of Harry Potter (just an example).

I tried the following syntax:

LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter01ENG.txt","C:\Har... See more
Hi Farkas,

First of, thanks for building LF Aligner ! I discovered it a few weeks ago only, but I love it !

I have questions about the batch mode, though.

Say, I want to create my own personal bilingual editions of Harry Potter (just an example).

I tried the following syntax:

LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter01ENG.txt","C:\HarryPotter02ENG.txt","C:\HarryPotter03ENG.txt","C:\HarryPotter04ENG.txt","C:\HarryPotter05ENG.txt","C:\HarryPotter06ENG.txt","C:\HarryPotter07ENG.txt","C:\HarryPotter01FRE.txt","C:\HarryPotter02FRE.txt","C:\HarryPotter03FRE.txt","C:\HarryPotter04FRE.txt","C:\HarryPotter05FRE.txt","C:\HarryPotter06FRE.txt","C:\HarryPotter07FRE.txt" --languages="en","en","en","en","en","en","en","fr","fr","fr","fr","fr","fr","fr" --segment="y" --review="xn" --tmx="n"


But it created an Excel table with as many columns as they were files ! That's not what I want. I just want it to align them two by two.

I tried alternating the English and the French files in the BAT, like writing HarryPotter01ENG HarryPotter01FRE HarryPotter02ENG HarryPotter02FRE (and en fr en fr for the languages), but the result was the same.

What am I doing wrong ?

***

EDIT : I have an additional question.

I just tried aligning just one pair. Here's the log :

Program: LF Aligner, version: 3.11, OS: Windows, launched: 2013.02.10_20.08.51

Setup: filetype_def: t; filetype_prompt: y; lang_1_iso_def: en; lang_2_iso_def: fr; l1_prompt: y; l2_prompt: y; segmenttext: y; confirm_segmenting: y; cleanup_def: y; cleanup_prompt: n; review_def: x; review_prompt: y; create_tmx_def: n; create_tmx_prompt: n; tmx_langcode_1_def: en; tmx_langcode_2_def: fr; tmx_langcode_1_prompt: y; tmx_langcode_2_prompt: y; creationdate_prompt: y; creationid_def: LF Aligner 3.11; creationid_prompt: y; ask_master_TM: n; chopmode: 15000; tmxnote_def: ; tmxnote_prompt: y; pdfmode: y

GUI on
filetype: t
Input file 1: HP01ENG.txt (D:/HP01ENG.txt)
Input file 2: HP01FRE.txt (D:/HP01FRE.txt)
Input file sizes: 440012 bytes 517197 bytes
File sizes after conversion to txt: 440012 bytes 517197 bytes
Initial stats:
- en: 2929 segments, 82249 words, 432359 chars
- fr: 3033 segments, 92561 words, 495020 chars
Segmentation: y, segment numbers:
- en: 2929 -> 6413
- fr: 3033 -> 6770
Reverted to unsegmented

Hunalign dictionary: en-fr.dic
Using Hunalign in normal mode, (2929 is less than 15000)
Aligned file: 2786 segments, 979241 bytes (D:/align_2013.02.10_20.08.51/aligned_en-fr.txt)
Cleanup: y
Review: x
Generated xls with 2786 lines
Converted xls to txt after review; 2786 lines
Create TMX: n
Terminated normally.


Why the discrepancy between the number of segments originally seen and the number of segments in the aligned file ? It's the same when I don't revert to unsegmented :

Program: LF Aligner, version: 3.11, OS: Windows, launched: 2013.02.10_20.14.53

Setup: filetype_def: t; filetype_prompt: y; lang_1_iso_def: en; lang_2_iso_def: fr; l1_prompt: y; l2_prompt: y; segmenttext: y; confirm_segmenting: y; cleanup_def: y; cleanup_prompt: n; review_def: x; review_prompt: y; create_tmx_def: n; create_tmx_prompt: n; tmx_langcode_1_def: en; tmx_langcode_2_def: fr; tmx_langcode_1_prompt: y; tmx_langcode_2_prompt: y; creationdate_prompt: y; creationid_def: LF Aligner 3.11; creationid_prompt: y; ask_master_TM: n; chopmode: 15000; tmxnote_def: ; tmxnote_prompt: y; pdfmode: y

GUI on
filetype: t
Input file 1: HP01ENG.txt (D:/HP01ENG.txt)
Input file 2: HP01FRE.txt (D:/HP01FRE.txt)
Input file sizes: 440012 bytes 517197 bytes
File sizes after conversion to txt: 440012 bytes 517197 bytes
Initial stats:
- en: 2929 segments, 82249 words, 432359 chars
- fr: 3033 segments, 92561 words, 495020 chars
Segmentation: y, segment numbers:
- en: 2929 -> 6413
- fr: 3033 -> 6770
Using segmented file versions
Hunalign dictionary: en-fr.dic
Using Hunalign in normal mode, (6413 is less than 15000)
Aligned file: 6223 segments, 1012881 bytes (D:/align_2013.02.10_20.14.53/aligned_en-fr.txt)
Cleanup: y
Review: x
Generated xls with 6223 lines
Converted xls to txt after review; 6223 lines
Create TMX: n
Terminated normally.


Thanks.

[Edited at 2013-02-10 19:31 GMT]
Collapse


 

FarkasAndras  Identity Verified
Local time: 12:41
English to Hungarian
+ ...
TOPIC STARTER
separate commands Feb 10, 2013

Hi,
if you list all 16 files in one command, the aligner assumes that there are 16 different languages in this project and generates a 16-column table as you found out.
What you need to do is issue a separate command for each file pair:
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter01ENG.txt","C:\HarryPotter01FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter02ENG.txt","C:\HarryPo
... See more
Hi,
if you list all 16 files in one command, the aligner assumes that there are 16 different languages in this project and generates a 16-column table as you found out.
What you need to do is issue a separate command for each file pair:
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter01ENG.txt","C:\HarryPotter01FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter02ENG.txt","C:\HarryPotter02FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter03ENG.txt","C:\HarryPotter03FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter04ENG.txt","C:\HarryPotter04FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter05ENG.txt","C:\HarryPotter05FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter06ENG.txt","C:\HarryPotter06FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"
LF_aligner_3.11.exe --filetype="t" --infiles="C:\HarryPotter07ENG.txt","C:\HarryPotter07FRE.txt" --languages="en","fr" --segment="y" --review="xn" --tmx="n"


Just put these in a .bat file (one line per command) and you'll be set. If you add an --outfile to each command, then a single txt file will be generated, containing the text of all the file pairs.
Collapse


 
Pages in topic:   < [1 2 3 4 5 6 7 8 9] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

New free & open source aligner (for Windows, OS X and linux)

Advanced search







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

More info »
SDL MultiTerm 2021
One central location to store and manage multilingual terminology.

By providing access to all those involved in applying terminology (such as engineers, marketers, translators, and terminologists), our terminology management solution ensures consistent and high-quality content from source through to translation.

More info »



Forums
  • All of ProZ.com
  • 용어 검색
  • 일거리
  • 포럼
  • Multiple search