TM anonymization (GDPR)
Thread poster: Production SA
Production SA
Production SA
Belgium
Local time: 05:59
Apr 12, 2018

Hi all,

With the enforcement of GDPR in Europe as from May 25th, any insights in how to anonymize client TM content, essentially removing personal data?

Thx!


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 04:59
Member (2009)
Dutch to English
+ ...
We are living in interesting times. Apr 12, 2018

Production SA wrote:

Hi all,

With the enforcement of GDPR in Europe as from May 25th, any insights in how to anonymize client TM content, essentially removing personal data?

Thx!


Hmm, I'm very curious as to whether this will actually become necessary, and also, whether anyone will actually do it if it is. However, assuming you do need to do it, there are a few options. I would first of all recommend contacting Kevin Dias (the guy behind TM-Town), since he's quite knowledgeable re automated data anonymisation stuff. I think he has some kind of automatic system in one of TM-Town's tools. Who knows, he may even be working on something relating to the upcoming GDPR changes. Actually, I suspect other people are already working on solutions for translation data anonymisation.

Another avenue to investigate is CAT tools, and specifically, "untranslatables", "non-translatables", "place-holders", "tokens" (or whatever term each particular tool was chosen). Some CAT tools already have ways of automatically filtering out certain terms from data sent to online machine translation systems (such as CafeTran), whether or not using regular expressions. I can imagine someone clever coming up with something that might work without all too much trouble, preferably automated of course. That is, a system which scans your document looking for potential candidates, using regular expressions and/or customer-defined lists, and then replaces them with codes.

All very interesting stuff, and I am extremely curious to see how it all pans out! if I hear anything interesting I will report back here in this thread.

Michael


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 04:59
Member (2009)
Dutch to English
+ ...
AI-powered GDPR data anonymisation (‘pseudonymisation’) tool? Apr 12, 2018

I also know that Memsource recently released an artificial intelligence-powered non-translatables feature. See e.g. https://www.memsource.com/blog/2018/01/09/memsource-releasing-first-feature-powered-by-artificial-intelligence/

Given the fact that AI is popping up literally everywhere these days (Mark Zuckerberg h
... See more
I also know that Memsource recently released an artificial intelligence-powered non-translatables feature. See e.g. https://www.memsource.com/blog/2018/01/09/memsource-releasing-first-feature-powered-by-artificial-intelligence/

Given the fact that AI is popping up literally everywhere these days (Mark Zuckerberg hopes to use it to tackle the massive problem Facebook is currently having with its huge amount of questionable content), I wouldn't be surprised if someone was already working on something AI-powered in this area.

see also:

Pseudonymisation
The GDPR refers to pseudonymisation as a process that transforms personal data in such a way that the resulting data cannot be attributed to a specific data subject without the use of additional information. An example is ENCRYPTION, which renders the original data unintelligible and the process cannot be reversed without access to the correct decryption key. The GDPR requires for the additional information (such as the decryption key) to be kept separately from the pseudonymised data.

Another example of pseudonymisation is TOKENIZATION, which is a non-mathematical approach to protecting data at rest that replaces sensitive data with non-sensitive substitutes, referred to as tokens. The tokens have no extrinsic or exploitable meaning or value. Tokenization does not alter the type or length of data, which means it can be processed by legacy systems such as databases that may be sensitive to data length and type.

That requires much fewer computational resources to process and less storage space in databases than traditionally-encrypted data. That is achieved by keeping specific data fully or partially visible for processing and analytics while sensitive information is kept hidden.

Pseudonymisation is recommended to reduce the risks to the concerned data subjects and also to help controllers and processors to meet their data protection obligations (Recital 28).

Although the GDPR encourages the use of pseudonymisation to "reduce risks to the data subjects" (Recital 28), pseudonymised data is still considered personal data (Recital 26) and so remains covered by the GDPR.


(https://en.wikipedia.org/wiki/General_Data_Protection_Regulation#Pseudonymisation )
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 04:59
Member (2009)
Dutch to English
+ ...
encryption sufficient to ensure compliance? Apr 12, 2018

Having just read the Wikipedia article, and specifically the bit about encryption:

"The GDPR requires for the additional information (such as the decryption key) to be kept separately from the pseudonymised data."

I suppose one way to comply might be to encrypt everything, and store the decryption key somewhere else. Will have to look into this.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Regular expressions and macros Apr 13, 2018

Michael Beijer wrote:

Some CAT tools already have ways of automatically filtering out certain terms from data sent to online machine translation systems (such as CafeTran), whether or not using regular expressions. I can imagine someone clever coming up with something that might work without all too much trouble, preferably automated of course. That is, a system which scans your document looking for potential candidates, using regular expressions and/or customer-defined lists, and then replaces them with codes.


I'm using CafeTran Espresso 2018 as my CAT tool and per client I maintain a dedicated glossary for non-translatables. (I only translate machine manuals.)

When I start working on a new job, I have a look at the first pages and the list of spare parts and technical specifications at the end of the PDF that is part of the job. Here I find most (not all) brand names, product names, street names etc., data that is sensitive and that I don't want to send out to any MT system.

Then I quickly add these data to my dedicated glossary for non-translatables for the particular client (of course I can attach several glossaries for non-translatables, created for different clients, to one job). By means of a simple macro I tag the lines in the glossary for non-translatables to become regular expressions.

When I meet new non-translatables during the actual translation stage, I quickly add them to my glossary for non-translatables. Since I've linked the four NMT systems that I'm currently using via their APIs, these missed non-translatables will be sent to the NMT system once.

I see three ways to prevent this:

Besides from the improved data security I also benefit from a better legibility of suggested translations where (long) company and product names are masked and replaced with a short token. I can concentrate better on the grammar and style.

[Edited at 2018-04-13 19:52 GMT]


 
Igor Kmitowski
Igor Kmitowski  Identity Verified
Poland
Local time: 05:59
Member (2016)
English to Polish
+ ...
Masking non-translatables Apr 13, 2018

> Then I quickly add these data to my dedicated glossary for non-translatables

You can let the program mask them automatically via turning on Edit > Preferences > Mask non-translatable fragments option.


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Correct Apr 13, 2018

Igor Kmitowski wrote:

> Then I quickly add these data to my dedicated glossary for non-translatables

You can let the program mask them automatically via turning on Edit > Preferences > Mask non-translatable fragments option.


Yes. That is how I do it. Actually, this setting is always on, on my system.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

TM anonymization (GDPR)







Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »