What’s the problem with sentence matching? Thread poster: Els Eerdekens
|
Dear all, In her article from 2008, Carme Colominas writes the following: (http://benjamins.com/#catalog/journals/babel.54.4.03col/details) “Most of the current Translation Memory systems are based on segments determined by marks that in most cases correspond to a complete sentence. The problem of complete sentence matching is that ... See more Dear all, In her article from 2008, Carme Colominas writes the following: (http://benjamins.com/#catalog/journals/babel.54.4.03col/details) “Most of the current Translation Memory systems are based on segments determined by marks that in most cases correspond to a complete sentence. The problem of complete sentence matching is that examples are often excluded from the matching candidates even though they probably contain one or more useful sub-segments that could be helpful to the translation." I don’t completely understand what the problem with sentence matching is. I suppose that the concordance search resolves the problem, but that noun phrases or pre-and postmodified noun phrases cannot be found. How is it possible that some matching candidates are excluded? "In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation.” Are there yet systems that work “below” the sentence level? Thanks! Els ▲ Collapse | | |
Simply add a custom end-of-segment separator to break long sentences into smaller pieces. E.g. if you set comma as a custom separator, it will break at each comma. | | | IrimiConsulting Sweden Local time: 15:16 Member (2010) English to Swedish + ... Below sentence level -> phrase level | May 2, 2013 |
Matching on the phrase level would definitely be possible, but would require a lot more intelligence from the software since it needs to analyse word classes and grammar rather than just text strings, which in turn requires the use of dictionaries. There will always be problems with words not found in the dictionary and discontinous phrases, and some languages will be less suitable for phrase-level matching. For "my" languages (English, Swedish, German and French), phrase-level matc... See more Matching on the phrase level would definitely be possible, but would require a lot more intelligence from the software since it needs to analyse word classes and grammar rather than just text strings, which in turn requires the use of dictionaries. There will always be problems with words not found in the dictionary and discontinous phrases, and some languages will be less suitable for phrase-level matching. For "my" languages (English, Swedish, German and French), phrase-level matching would be fairly easy in English, Swedish and French. The German word order would complicate matters a bit, but it would still be quite doable. In the end, the result would depend to a large extent on the quality of the source text. The GIGO principle (garbage in - garbage out) is very valid in all sorts of language automation. "In view of these limitations, some proposals have been made in the literature regarding the possibility of building Translation Memory systems that operate “below” the sentence level, that is to say, at a sub-sentential level. Existing work demonstrates that sub-sentential segmentation of Translation Memories clearly shows a significantly best recall with respect to sentential segmentation.” Are there yet systems that work “below” the sentence level? ▲ Collapse | | | Heinrich Pesch Finland Local time: 16:16 Member (2003) Finnish to German + ...
In SDL Studio it is called Autosuggest, in DVX Deep Mining. I haven't used those features yet, but they search for phrases within the text and in the TM and would speed up translation process. | |
|
|
I find AutoSuggest very useful | May 2, 2013 |
Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored. It might be possible to avoid some of them by filtering or editing the TM before using it to create the AutoSuggest dictionary, if one was aware of what to avoid. I did not do that, but still only get a few 'impossible' suggestions. It would be ideal if the... See more Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored. It might be possible to avoid some of them by filtering or editing the TM before using it to create the AutoSuggest dictionary, if one was aware of what to avoid. I did not do that, but still only get a few 'impossible' suggestions. It would be ideal if they could be edited out afterwards, but they are not a serious problem. ▲ Collapse | | |
Sergei Leshchinsky wrote: Simply add a custom end-of-segment separator to break long sentences into smaller pieces. E.g. if you set comma as a custom separator, it will break at each comma. Dear Sergei, Where can I do this? In WinAlign from Trados (segmentation rules) or in the source document? If I have to change the segmentation rules, what do I have to do exactly? Kind regards, Els | | | All CAT tools should be able to segment at a comma | May 4, 2013 |
In TWB, for example, File/Setup/Segmentation rules, click "Add", add "Comma" and then in "Rule"/"Stop character", enter a comma (","). TWB will then segment: "Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored" as four segments: Because it works purely on statistical analysis and is... See more In TWB, for example, File/Setup/Segmentation rules, click "Add", add "Comma" and then in "Rule"/"Stop character", enter a comma (","). TWB will then segment: "Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored" as four segments: Because it works purely on statistical analysis and is not 'intelligent', it does occasionally come up with a few ridiculous suggestions, but on the whole the benefit far outweighs these, and they are easily ignored ▲ Collapse | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » What’s the problem with sentence matching? Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
| Anycount & Translation Office 3000 | Translation Office 3000
Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |