Pages in topic: < [1 2 3 4 5 6 7 8 9] > |
New free & open source aligner (for Windows, OS X and linux) Thread poster: FarkasAndras
|
FarkasAndras Local time: 11:38 English to Hungarian + ... TOPIC STARTER
By popular demand and about 8 months late (maybe more), here is a new version From the changelog: New in 4.1: - Support for new URLs used by eur-lex (EU legislation downloads work again) - French translation added (can be enabled in setup) - TMX maker GUI fixes - Fixed bug in the alignment editor's "Load addition... See more By popular demand and about 8 months late (maybe more), here is a new version From the changelog: New in 4.1: - Support for new URLs used by eur-lex (EU legislation downloads work again) - French translation added (can be enabled in setup) - TMX maker GUI fixes - Fixed bug in the alignment editor's "Load additional column with autoalignment" feature - Added copy/paste to alignment editor (Ctrl-n/Ctrl-m, can only copy whole cell and only within program) I haven't done much testing, hence the limited release on dropbox. Please report bugs here, or by email. Minor feature requests are welcome too. I have no plans for any major changes or additions (apart from possibly reviving a dormant effort to get a mac gui version out). The French translation isn't available yet, so don't try enabling it. ▲ Collapse | | |
Trying to create multilingual TMs and so lost... | Mar 2, 2015 |
Hello András, I'm not sure if this is an LF Aligner question or a TMLookup question. I am brand new to TMLookup (and love it!), and not new at all to LF Aligner, but it's the first time I'm aligning 3 languages instead of 2 and creating translation memories (I usually relied on tabbed TXTs). I aligned some texts using the following syntax: LF_aligner_4.04.exe -f=t -l=en,fr,pt -s=n -r=xn -t=y -o=C:\outfile.txt -i="C:\TextENG.txt","C:\TextFRE.txt","C:\TextPOR... See more Hello András, I'm not sure if this is an LF Aligner question or a TMLookup question. I am brand new to TMLookup (and love it!), and not new at all to LF Aligner, but it's the first time I'm aligning 3 languages instead of 2 and creating translation memories (I usually relied on tabbed TXTs). I aligned some texts using the following syntax: LF_aligner_4.04.exe -f=t -l=en,fr,pt -s=n -r=xn -t=y -o=C:\outfile.txt -i="C:\TextENG.txt","C:\TextFRE.txt","C:\TextPOR.txt" When the aligner asked me questions about the order of languages in my TM tool (I think?), I said EN FR PT... kind of at random. Additionally, I aligned some texts that were only PT>FR, and others that were only PT>EN (and I have a number of pre-existing tabbed TXTs that are only EN>FR). I translate from English and Portuguese into French. I currently see two query fields in TMLookup. Is it possible to use more? Say I spend all of March working on a Portuguese text, my first query field will always be PT; but I want to see the concordances in both FR and EN. If I set the second query field as FR, the texts I aligned as PT>FR will work, but not those I aligned as PT>EN. And the ones I aligned as EN FR PT will display, but as PT FR PT Source, and I'd like to see the English as well if that's possible. And then if I spend all of April working on an English text, my first query field will always be EN, but I run into the same kind of problem with the second query field and the order of the columns... I'm sorry if I'm being very confusing, I'm pretty confused myself! But anyway, what should I do: declare a different order of languages when LF Aligner creates the TM, align the texts several times with different language orders, use several different databases in TMLookup (basing them on the source or the target language(s)?), import files differently? Thanks! ▲ Collapse | | |
FarkasAndras Local time: 11:38 English to Hungarian + ... TOPIC STARTER Multilingual DB | Mar 2, 2015 |
This is only related to LF Aligner to the extent that LF Aligner is probably the only aligner that allows you to generate 3-language alignments. The meat of the matter is how to handle multilingual files in TMLookup. You can do what you want: import all your bi- and trilingual alignments into the same trilingual TMLookup DB and search them in all directions. Here's how you do it: - Create a TMLookup DB with the three languages (and optional source column). - Take an en-fr... See more This is only related to LF Aligner to the extent that LF Aligner is probably the only aligner that allows you to generate 3-language alignments. The meat of the matter is how to handle multilingual files in TMLookup. You can do what you want: import all your bi- and trilingual alignments into the same trilingual TMLookup DB and search them in all directions. Here's how you do it: - Create a TMLookup DB with the three languages (and optional source column). - Take an en-fr-pt trilingual tabbed files, open it and check the order of languages (en, pt, fr, or pt, en fr etc.). Import the file. In the import dialog, pick 3 as the 'Number of columns to read from file', and specify the languages in the order they are in the txt file. - Repeat this with all the en-fr-pt tabbed files. Obviously, if the languages are in a different order in the files, you have to specify them in a different order in TMLookup. TMLookup tries to guess the order from the file name but always check manually. - Then add all your bilingual files. Leave the column number on 2, make sure you have the correct two languages picked in the correct order. Do not select multiple files at the same time and use the Process all files with the same settings checkbox unless you are 100% sure that the languages are in the same order in all files. (If you select multiple files and leave the checkbox unchecked, you can still specify individual settings of course.) When you have a correctly arranged DB, you can switch the query boxes around with the dropdown boxes. When you get the "PT FR PT Source" display thing, you can just click View/Display additional columns again, pick the missing columns and English will be added. BTW if you happen to have, say, a fr-pt database and you want to import an en-pt-fr tabbed file into it, you can do that as well. You read three columns from the file and discard the first (en) column by choosing "skip" from the dropbodown list. ▲ Collapse | | |
It looks like clicking 'View/Display additional columns' when needed was enough to solve all my problems! Thanks a lot ! | |
|
|
Michael Beijer United Kingdom Local time: 10:38 Member (2009) Dutch to English + ... @FarkasAndras: | Sep 13, 2015 |
Any news re the GUI for batch mode you mentioned a while back? I am getting errors aligning batches of 100 txt files each with AlignFactory, and wanted to try LF Aligner. MB | | |
2nl (X) Netherlands Local time: 11:38 Work-around? | Sep 14, 2015 |
Michael Beijer wrote: Any news re the GUI for batch mode you mentioned a while back? I am getting errors aligning batches of 100 txt files each with AlignFactory, and wanted to try LF Aligner. MB How about this work-around? | | |
FarkasAndras Local time: 11:38 English to Hungarian + ... TOPIC STARTER
Merging files before aligning is an option, but it's not a very good one. If one file pair is badly mismatched (one file several dozen or several hundred segments longer than the other) then it can throw off the alignment throughout the rest of the project. It's best to keep files isolated to isolate problems. Re: GUI batch mode, LF Aligner is on the back burner... I did write a simple GUI program that generates a .bat file for batch alignment, but it's primitive and ugly. I mostly ... See more Merging files before aligning is an option, but it's not a very good one. If one file pair is badly mismatched (one file several dozen or several hundred segments longer than the other) then it can throw off the alignment throughout the rest of the project. It's best to keep files isolated to isolate problems. Re: GUI batch mode, LF Aligner is on the back burner... I did write a simple GUI program that generates a .bat file for batch alignment, but it's primitive and ugly. I mostly wrote it for my own use. Maybe I will polish it up and publish it at some point, but the earliest time that could possibly happen is next week. You can of course generate the .bat yourself, which is what I did up to a month or two ago. Copying file names to the clipboard from Total Commander, pasting them in Excel and then using either Excel or search and replace in a text editor to add the rest of the command makes it relatively painless... relatively being the keyword.
[Edited at 2015-09-14 08:35 GMT] ▲ Collapse | | |
Michael Beijer United Kingdom Local time: 10:38 Member (2009) Dutch to English + ... There really should be an easier way to do this | Sep 14, 2015 |
FarkasAndras wrote: Merging files before aligning is an option, but it's not a very good one. If one file pair is badly mismatched (one file several dozen or several hundred segments longer than the other) then it can throw off the alignment throughout the rest of the project. It's best to keep files isolated to isolate problems. Re: GUI batch mode, LF Aligner is on the back burner... I did write a simple GUI program that generates a .bat file for batch alignment, but it's primitive and ugly. I mostly wrote it for my own use. Maybe I will polish it up and publish it at some point, but the earliest time that could possibly happen is next week. You can of course generate the .bat yourself, which is what I did up to a month or two ago. Copying file names to the clipboard from Total Commander, pasting them in Excel and then using either Excel or search and replace in a text editor to add the rest of the command makes it relatively painless... relatively being the keyword.
[Edited at 2015-09-14 08:35 GMT] Thanks FarkasAndras, But I solved it for now. I realised that in AlignFactory you can set the program to spit out separate TMXs, as well as one big one. So when you run a huge batch job, and the program chokes, the last TMX it spits out will be the one with the problem. The name of this TMX will correspond to the txt file (pair) with the problem. Just skipping this txt file usually allows the project to complete if rerun. I then just convert the single txt file (pair) with the problem into a separate TMX using Heartsome's TMX editor (Tools > Convert to TMX), and then merge it with the AlignFactory TMX. Indeed: it's never a good idea to merge 100 txt files into a single big one for stuff like this. Way too much chance of something going wrong, not to mention merely merging 100 text files of this type is in itself quite a chore, and likely to choke most programs, even EmEditor. No time for generating .bat files, etc., right now but I do look forward to your future GUI batch mode thingee, as I would love to test it against AlignFactory. There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times. PS: this is what I'm currently working on: http://homepages.inf.ed.ac.uk/pkoehn/publications/de-news/speech | |
|
|
FarkasAndras Local time: 11:38 English to Hungarian + ... TOPIC STARTER
Michael Beijer wrote: There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times. PS: this is what I'm currently working on: http://homepages.inf.ed.ac.uk/pkoehn/publications/de-news/speech I thought you already did this with the public patent TM? In any case, some dumb aligners do this. I also have my own software for this because I store my large multilingual TMs in a similar format (separate txt files for each document in each language, one line per segment). Maybe one day I will add such a dumb pairing feature to lf aligner, or release a separate program that merges files into tabbed files. | | |
Michael Beijer United Kingdom Local time: 10:38 Member (2009) Dutch to English + ... A "dumb aligner" would be great! | Sep 14, 2015 |
FarkasAndras wrote: Michael Beijer wrote: There really should be an easier way to do this though, seeing as how all of these files are effectively already aligned. All I need is for a program to: take the first line of text file de1.txt and match it to the first line of text file en1.txt, and turn this into a TU. Then, it needs to take the second line of text file de1.txt and match it with the second line of text file en1.txt, and turn it into a TU. Then repeat that a few times. PS: this is what I'm currently working on: http://homepages.inf.ed.ac.uk/pkoehn/publications/de-news/speech I thought you already did this with the public patent TM? In any case, some dumb aligners do this. I also have my own software for this because I store my large multilingual TMs in a similar format (separate txt files for each document in each language, one line per segment). Maybe one day I will add such a dumb pairing feature to lf aligner, or release a separate program that merges files into tabbed files. I did, but there were much fewer files, and they were much bigger. Now, I have hundreds of small txt files, so my approach is different. This is how I did the PatT data: Original workflow: 1. Append ".txt" to file names 2. Open files in EmEditor (or a good text editor capable of opening large files; UltraEdit is also good) 3. In Ron's CSV Editor, create empty file and paste in contents of .txt files (of src + trgt language) to create a tab-delimited .csv 4. In Xbench, convert aforementioned .csv to .tmx; 5. In Heartsome TMX editor, edit the TMX custom attributes and clean up the TMX (remove duplicates). Improved workflow: 1. Append ".txt" to file names 2. Use "split" command in cmd.exe to split large text file into smaller files based on number of lines (1,000,000 lines): split -l 1000000 filename.txt 3. Use "generate_tabbed.exe" (in András Farkas’s "Grab Bag", included in LF Aligner download) to convert src and trgt language .txt files into tab-delimited .txt containing both src + trgt 4. Use Heartsome TMX editor to convert bilingual tab-del .txt files into .tmx | | |
Michael Beijer United Kingdom Local time: 10:38 Member (2009) Dutch to English + ... Found the problem! | Sep 14, 2015 |
Every so often, align factory will choke. As I mentioned, the name of the last TMX it created allows me to see which text file is the problem. Looking in these text files reveals that the problem is always the presence of this character: aka \x1a If I F&R all these with ' The batch alignment completes fine. | | |
New Linux versions? | Oct 2, 2015 |
I've noticed that the Linux version is at 3.11 while the Windows one is at 4.1. Any chances to have the Linux version updated? | |
|
|
FarkasAndras Local time: 11:38 English to Hungarian + ... TOPIC STARTER
Yes, the linux & mac versions got left behind. Maybe I will get around to releasing a new linux version, but I can't make any promises. You can try to roll your own, though. Find the .pl in the linux version, there is a short howto at the top of the file (aligner/scripts/LF_aligner_XXX.pl). Download the Windows version, find the .pl and copy the relevant bits over into the linux .pl. Most of the changes since 3.11 affected the GUI, which obviously make no difference for linux users. ... See more Yes, the linux & mac versions got left behind. Maybe I will get around to releasing a new linux version, but I can't make any promises. You can try to roll your own, though. Find the .pl in the linux version, there is a short howto at the top of the file (aligner/scripts/LF_aligner_XXX.pl). Download the Windows version, find the .pl and copy the relevant bits over into the linux .pl. Most of the changes since 3.11 affected the GUI, which obviously make no difference for linux users. There have been only a handful of other updates that could be useful to linux users (see changelog). ▲ Collapse | | |
esperantisto Local time: 13:38 Member (2006) English to Russian + ... SITE LOCALIZER TMX Maker: errors processing a text file | Jan 6, 2016 |
I have a plain-text file UTF-8 that I want to convert to a TMX file (using TMX Maker 3.0 from LF Aligner 4.1 on Windows 7) as follows: When I start TMX Maker, I see: ... See moreI have a plain-text file UTF-8 that I want to convert to a TMX file (using TMX Maker 3.0 from LF Aligner 4.1 on Windows 7) as follows: When I start TMX Maker, I see: Code:
| Drag and drop the input file (tab delimited txt in UTF-8 encoding, or xls) here
and press enter. |
|
My file seems to fit, thus, I drag and drop it, go through the following steps (choose the output file name, the number of languages (2, as by default), the language codes (I specify EN-US and BE-BY), the date/time (I confirm the default), the creator name (I confirm the default), the note (I leave none), and hit Enter. Then I get just a bunch of: Code:
| LINE XXX OF THE FILE DOESN'T HAVE ENOUGH COLUMNS, SO IT HAS BEEN SKIPPED.
CHECK THE SOURCE FILE AND RUN THE TMX MAKER AGAIN IF NEEDED |
|
and then: Code:
| 0 TUs have been written to the TMX. XXX segments were skipped (0 of them due to
being half-empty). |
|
And, obviously, the resulting TMX file contains only a conventional TMX header, but no TUs. I tried to search for the above error message in Internet, but I could only find: reading a CSV files columns directly into variables names with python. However, this does not help me. Is there anything else that I should check/look into? ▲ Collapse | | Michael Beijer United Kingdom Local time: 10:38 Member (2009) Dutch to English + ... Try Heartsome TMX editor | Jan 6, 2016 |
esperantisto wrote: I have a plain-text file UTF-8 that I want to convert to a TMX file (using TMX Maker 3.0 from LF Aligner 4.1 on Windows 7) as follows: When I start TMX Maker, I see: [snip!] Is there anything else that I should check/look into? Slightly off-topic, but the Heartsome TMX editor has a great little tool for converting tab-delimited files (and Excel files) to TMXs. Wonder of if/when anyone will take over the (now open source) project.
[Edited at 2016-01-06 09:15 GMT] | | | Pages in topic: < [1 2 3 4 5 6 7 8 9] > | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » New free & open source aligner (for Windows, OS X and linux) Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | |
|