Pages in topic:   < [1 2 3 4 5 6 7 8 9] >
New free & open source aligner (for Windows, OS X and linux)
Thread poster: FarkasAndras

Mette Melchior  Identity Verified
스웨덴
Local time: 19:11
English to Danish
+ ...
Just beat me to it ;-) Nov 8, 2010

I only saw your post after I had submitted mine, Michael. Well, the OPUS site is a great resource so I just wanted to mention it.

 

Adam Bojan  Identity Verified
폴란드
Local time: 19:11
Dutch to Polish
+ ...
SDLX align module vs LF aligner Nov 8, 2010

@FarkasAndras
SDLX align module works even simpler than WinAlign, which has some options as for tag, number or formatting significance. It simply segments the text just as the main SDLX program does using the same rules. It has no autoaligner, but in comparison with WA it works much quicker, has a more user friendly interface and allows to do everything using simple and logical keyboard shortcuts. However I don't want to review it here, the more that its development is dropped by SDL. Let
... See more
@FarkasAndras
SDLX align module works even simpler than WinAlign, which has some options as for tag, number or formatting significance. It simply segments the text just as the main SDLX program does using the same rules. It has no autoaligner, but in comparison with WA it works much quicker, has a more user friendly interface and allows to do everything using simple and logical keyboard shortcuts. However I don't want to review it here, the more that its development is dropped by SDL. Let us concentrate on the LF aligner which as I see gives better results.
As for the source forge page, I have already given you a big thumb, and certainly write a review when I use the program more
greetings,
Adam

[Zmieniono 2010-11-08 13:28 GMT]
Collapse


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
OPUS Nov 8, 2010

Thanks for suggesting OPUS, everyone. Some of the stuff on there might prove useful.

As to the encoding issue, it's no surprise... Encoding is the biggest mess in any sphere I have ever seen in my life. It's a horrid amalgam of half-baked ideas mostly left over from the time when data storage was so expensive they tried to store text in smaller spaces by using minuscule character sets, and when the idea that people would ever want to read documents from other cultures (and hence oth
... See more
Thanks for suggesting OPUS, everyone. Some of the stuff on there might prove useful.

As to the encoding issue, it's no surprise... Encoding is the biggest mess in any sphere I have ever seen in my life. It's a horrid amalgam of half-baked ideas mostly left over from the time when data storage was so expensive they tried to store text in smaller spaces by using minuscule character sets, and when the idea that people would ever want to read documents from other cultures (and hence other encodings) didn't really come up.
It was and is the single biggest problem in the aligner project - for instance, the windows version can't open any files with a non-ASCII name because handling special characters in filenames is so difficult I just gave up on it.
To make sure things work smoothly most of the time, I enforced utf-8 for input files in the aligner, but in general, the two main UTF-16 encodings (BE and LE) are also pretty widespread, as are other encodings like latin-1 and latin-2.
As I was saying, it's a huge mess. Even though the aligner only accepts UTF-8, there are 6 subtle variations of UTF-8 and they all need to work on all 3 platforms (win/osx/linux) for a total of 18 slightly different scenarios...

[Edited at 2010-11-08 22:51 GMT]
Collapse


 

Michael Beijer  Identity Verified
영국
Local time: 18:11
Member (2009)
Dutch to English
+ ...
@FarkasAndras Nov 10, 2010

I am having trouble with the batch aligner:

I first kept getting:

"File 1 doesn't exist (or its path or filename contains accented characters)"

Then

"The file extensions don't match. Skipping file pair 1."

And now it just crashes on the first file, leaving an emoty aligned_all.txt file in a folder it makes with the same name as the batch file.

What am I doing wrong?

I made the batch file with TC, and saved
... See more
I am having trouble with the batch aligner:

I first kept getting:

"File 1 doesn't exist (or its path or filename contains accented characters)"

Then

"The file extensions don't match. Skipping file pair 1."

And now it just crashes on the first file, leaving an emoty aligned_all.txt file in a folder it makes with the same name as the batch file.

What am I doing wrong?

I made the batch file with TC, and saved it as utf-8, no bom.

Michael

p.s my 2 log files are here: http://beijer.mx/storage/

[Edited at 2010-11-10 00:48 GMT]
Collapse


 

Mette Melchior  Identity Verified
스웨덴
Local time: 19:11
English to Danish
+ ...
Maybe special characters in file path? Nov 10, 2010

Hi Michael

I also got the first error you mention yesterday when I tried to align some texts but that seemed to be due to the fact that the top folder contained one of the special Danish characters (æ). I copied the files to another location and then it worked fine.

It also says in the readme file that the aligner might not work with non-ASCII characters in the paths or filenames. Maybe that could also the problem in your case?


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
Crash Nov 10, 2010

Michael J.W. Beijer wrote:

I am having trouble with the batch aligner:

"File 1 doesn't exist (or its path or filename contains accented characters)"


That's probably a non-ASCII character in the filename or path, or some other error regarding the filepath (either your txt is wrong or the aligner parses it incorrectly for some reason).
Make sure you only use the characters of the English alphabet and spaces in the file and folder names. You could also post the batch file you're using, I'll see if I can see something wrong (e.g. it should have full filepaths, not just filenames).
If it's not that, I'm not sure.
The log files didn't help, unfortunately. I'll take a stab at improving the logging.


And now it just crashes on the first file, leaving an empty aligned_all.txt file in a folder it makes with the same name as the batch file.

Crashes? That'd be really odd. What happens exactly?


 

Michael Beijer  Identity Verified
영국
Local time: 18:11
Member (2009)
Dutch to English
+ ...
"_"? Nov 10, 2010

Here's my batch file:

http://beijer.mx/storage/paths.txt

Is the underscore character permitted? "_"
That's the only one I can think of that might be causing the problem.

Michael

p.s. By the way, should the Line terminators of my text file be: CR/LF (DOS), LF (UNIX), or CR (MAC)? I am pasting the contents from Excel into UltraEdit, and making a u
... See more
Here's my batch file:

http://beijer.mx/storage/paths.txt

Is the underscore character permitted? "_"
That's the only one I can think of that might be causing the problem.

Michael

p.s. By the way, should the Line terminators of my text file be: CR/LF (DOS), LF (UNIX), or CR (MAC)? I am pasting the contents from Excel into UltraEdit, and making a utf-8 txt file without BOM.

[Edited at 2010-11-10 11:38 GMT]
Collapse


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
Filenames Nov 10, 2010

Michael J.W. Beijer wrote:

Here's my batch file:

http://beijer.mx/storage/paths.txt

Is the underscore character permitted? "_"
That's the only one I can think of that might be causing the problem.

Michael

p.s. By the way, should the Line terminators of my text file be: CR/LF (DOS), LF (UNIX), or CR (MAC)? I am pasting the contents from Excel into UltraEdit, and making a utf-8 txt file without BOM.

Underscores and hyphens are supported. Your file names and paths are fine, and your batch txt is fine.

Default line endings are OS-specific in Perl, so in Windows, CRLF is the safest bet, although I never tested unix endings and such. (This only applies to the batch file - I made sure the actual input files can have any line ending on any platform.)

The problem is that your English and Dutch files have the same filename. The aligner copies all the input files to the same folder, so it ends up overwriting your English file with your Dutch file and then it can't find the English file.
As a workaround, use batch renaming in Total Commander (select all English files, click Files/Multi-rename Tool and write en_ before the [N] in the "Rename mask:file name" field. Then click Start, click close, update your batch file and it should work.

I'll probably fix this whenever I get around to it.


 

Michael Beijer  Identity Verified
영국
Local time: 18:11
Member (2009)
Dutch to English
+ ...
Thanks! Nov 10, 2010

I hadn't thought of that.

I'll change the file names. My present collection from the OPUS website consists of over 5,000 individual text files, so I hope TC can manage;)

Apart from that, your new LF Aligner is pretty darned cool!

Michael


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
Don't you worry Nov 10, 2010

Michael J.W. Beijer wrote:

I hadn't thought of that.

I'll change the file names. My present collection from the OPUS website consists of over 5,000 individual text files, so I hope TC can manage;)

Apart from that, your new LF Aligner is pretty darned cool!

Michael

You could always just wait for the next batch aligner update, but in general, TCMD will handle just about anything you care to throw at it.

BTW if you need really advanced renaming for some purpose, you could do it through a windows .bat.
You just put one command per line in an ANSI txt saved as .bat, IIRC the rename command is
ren C:\folder\oldname.txt C:\folder\newname.txt
It's pretty easy to produce such a file with Excel and Ultraedit, so you can essentially do very powerful regex-based batch renames (say, replace non-ASCII letters in every filename with ASCII characters without doing something blunt like renaming the files to 1.txt, 2.txt etc.)

If you have thousands of files you probably don't want to do anything manually, so make sure you go through the entire setup file and set it to run unattended. With the right setup, you can drag & drop the batch file, go have a coffee and have the output files ready when you come back.

The batch aligner is always behind the main script in development, so there's a couple of things I need to improve in it. E.g. the current version generates xls and tmx files out of each file separately, not the total project and the logging is not up to snuff. (To get a big daddy TMX, use the TMX maker on aligned_all.txt after the batch alignment.)
If you have feedback about what it should and shouldn't do, go ahead.

[Edited at 2010-11-10 15:51 GMT]


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
Crashes Nov 10, 2010

Note: the missing file did indeed cause the aligner to quit (not crash, really). In my defense, it was a graceful exit: it printed an error message (file not found) and then the script ended.
The thing is, when the script ends, the OS closes the console window so you had no chance to read the error message. To read an error message after an error like this, you could open a console window (commands/run DOS in TCMD or run/cmd in the start menu) and drag-&-drop the aligner's exe in that inst
... See more
Note: the missing file did indeed cause the aligner to quit (not crash, really). In my defense, it was a graceful exit: it printed an error message (file not found) and then the script ended.
The thing is, when the script ends, the OS closes the console window so you had no chance to read the error message. To read an error message after an error like this, you could open a console window (commands/run DOS in TCMD or run/cmd in the start menu) and drag-&-drop the aligner's exe in that instead of starting the aligner with a double click. Manually opened console windows don't close when the programmes running in them terminate.
Collapse


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
New version Nov 11, 2010

I fixed the bug in the batch aligner, and did a few more fixes/improvements. The windows package is available on sourceforge (2.01b). The "normal" aligner hasn't changed, though.

Bugreports and feedback still welcome, of course.
BTW the traffic is pretty impressive: in 6 days, it got about as many downloads as aligner.bat had in the last 3 months.


 

Nguyen Dieu  Identity Verified
베트남
Local time: 01:11
Member (2008)
English to Vietnamese
+ ...
win7 and vietnamese text Nov 12, 2010

Hi,

Could you please let me know if this soft works with win7 and Vietnamese text?

Thanks


 

FarkasAndras  Identity Verified
Local time: 19:11
English to Hungarian
+ ...
TOPIC STARTER
Why don't you tell me? Nov 12, 2010

Nguyen Cong Dieu wrote:

Hi,

Could you please let me know if this soft works with win7 and Vietnamese text?

Thanks


I won't know until somebody tries it, will I?
Win7 is supported, and UTF-8 is supported. If UTF-8 contains all the Vietnamese characters, it should work in principle, especially with txt input files. Try it and report back!


 

Piotr Bienkowski  Identity Verified
폴란드
Local time: 19:11
English to Polish
+ ...
What is the format of the CELEX number? Nov 15, 2010

I want to make and alignment out of 2007/47/EC and I enter 32007L0047 as the number but I can't seem to be able to succeed.

 
Pages in topic:   < [1 2 3 4 5 6 7 8 9] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

New free & open source aligner (for Windows, OS X and linux)

Advanced search







SDL Trados Studio 2021 Freelance
The leading translation software used by over 270,000 translators.

SDL Trados Studio 2021 has evolved to bring translators a brand new experience. Designed with user experience at its core, Studio 2021 transforms how new users get up and running and helps experienced users make the most of the powerful features.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »



Forums
  • All of ProZ.com
  • 용어 검색
  • 일거리
  • 포럼
  • Multiple search