Glossary entry (derived from question below)
German term or phrase:
Datenbereinigung
English translation:
data cleaning
Added to glossary by
Rebecca Holmes
Oct 8, 2002 06:48
21 yrs ago
4 viewers *
German term
Datenbereinigung
German to English
Tech/Engineering
computer database system
From a PPT slide listing the advantages of a customer database system:
Datenvalidierung und -bereinigung schon beim Import
Datenvalidierung und -bereinigung schon beim Import
Proposed translations
(English)
4 +3 | data cleaning | Endre Both |
4 +2 | data clean-up | gangels (X) |
5 | data cleansing | Joanne Parker |
5 | Data validation and clean-up on import | Martin Schneekloth (X) |
3 +1 | data filtering / cleansing | Klaus Dorn (X) |
1 | Oooops ... | Chris Rowson (X) |
Proposed translations
+3
9 mins
Selected
data cleaning
That's the standard term in English as well.
4 KudoZ points awarded for this answer.
Comment: "Thank you very much Klaus D., Endre, Chris, Joanne, Klaus B. and Martin. I have waited a couple days to pick an answer because the well-informed choices you provided made it very difficult to select just one. In the end the only fair way to call it seems to be number of Google hits:
13,800 for data cleaning, 11,500 for data cleansing, 10,900 for data filtering and 2,450 for data clean-up. It thus seems only fair to pick Endre's answer of "data cleaning." I really appreciate the amount of research you put into the question, however, Klaus D., and would like to extend my special thanks to all of you for your time and effort."
+1
3 mins
data filtering / cleansing
I favour "filtering" here, because it happens at import, while "cleansing" is something that is traditionally done afterwards...
--------------------------------------------------
Note added at 2002-10-08 06:53:39 (GMT)
--------------------------------------------------
\"data validation and filtering already at import\"
--------------------------------------------------
Note added at 2002-10-08 07:04:20 (GMT)
--------------------------------------------------
This project focuses on data cleansing, i.e., to detect and remove errors and inconsistencies in data from different sources to improve the data quality.
http://www.ics.uci.edu/~chenli/cleansing.html
--------------------------------------------------
Note added at 2002-10-08 07:05:01 (GMT)
--------------------------------------------------
Change the way you maintain your customer database. Experian Intact is the UK\'s leading Internet based data cleansing application.
http://www.experianintact.com/
--------------------------------------------------
Note added at 2002-10-08 07:05:51 (GMT)
--------------------------------------------------
Although commercial data cleansing and standardization software tools have been around for years, until fairly recently they weren\'t suitable for Web applications.
http://www.eweek.com/article2/0,,220591,00.asp?kc=EWAV10209K...
--------------------------------------------------
Note added at 2002-10-08 07:08:52 (GMT)
--------------------------------------------------
Data cleansing takes precedence
Friday 9th August 2002
http://www.it-director.com/article.php?id=3090
--------------------------------------------------
Note added at 2002-10-08 07:09:53 (GMT)
--------------------------------------------------
Data Cleansing Research Project*
--------------------------------------------------------------------------------
Summary:
This research is aimed at defining a framework for automated data cleansing. That is, given a large data set, automatically find and correct errors (semantic and syntactic) within the set. The underlying theoretical aspects of data quality research are being combined with problem solving methods from software testing, data mining, statistics, knowledge based systems, clustering, and machine learning to address this framework. The framework will define an underlying theory to support an accurate set of data quality metrics. A basic understanding of the inherent problems faced by automated data cleansing are being uncovered and investigated.
Technical Reports:
TR-CS-99-02 Progress Report on Data Cleansing 10-18-1999
TR-CS-00-02 Automated Identification of Errors in Data Sets 2-2-2000
TR-CS-00-03 Utilizing Association Rules for Identifcation of Possible Errors in Data Sets 2-28-2000
TR-CS-00-04 Utilizing Association Rules for the Data Cleansing 5-8-2000
http://www.msci.memphis.edu/~maleticj/dataclean.html
--------------------------------------------------
Note added at 2002-10-08 07:10:23 (GMT)
--------------------------------------------------
Data Cleansing Research Project*
--------------------------------------------------------------------------------
Summary:
This research is aimed at defining a framework for automated data cleansing. That is, given a large data set, automatically find and correct errors (semantic and syntactic) within the set. The underlying theoretical aspects of data quality research are being combined with problem solving methods from software testing, data mining, statistics, knowledge based systems, clustering, and machine learning to address this framework. The framework will define an underlying theory to support an accurate set of data quality metrics. A basic understanding of the inherent problems faced by automated data cleansing are being uncovered and investigated.
Technical Reports:
TR-CS-99-02 Progress Report on Data Cleansing 10-18-1999
TR-CS-00-02 Automated Identification of Errors in Data Sets 2-2-2000
TR-CS-00-03 Utilizing Association Rules for Identifcation of Possible Errors in Data Sets 2-28-2000
TR-CS-00-04 Utilizing Association Rules for the Data Cleansing 5-8-2000
http://www.msci.memphis.edu/~maleticj/dataclean.html
--------------------------------------------------
Note added at 2002-10-08 07:10:32 (GMT)
--------------------------------------------------
Data Cleansing Research Project*
--------------------------------------------------------------------------------
Summary:
This research is aimed at defining a framework for automated data cleansing. That is, given a large data set, automatically find and correct errors (semantic and syntactic) within the set. The underlying theoretical aspects of data quality research are being combined with problem solving methods from software testing, data mining, statistics, knowledge based systems, clustering, and machine learning to address this framework. The framework will define an underlying theory to support an accurate set of data quality metrics. A basic understanding of the inherent problems faced by automated data cleansing are being uncovered and investigated.
Technical Reports:
TR-CS-99-02 Progress Report on Data Cleansing 10-18-1999
TR-CS-00-02 Automated Identification of Errors in Data Sets 2-2-2000
TR-CS-00-03 Utilizing Association Rules for Identifcation of Possible Errors in Data Sets 2-28-2000
TR-CS-00-04 Utilizing Association Rules for the Data Cleansing 5-8-2000
http://www.msci.memphis.edu/~maleticj/dataclean.html
--------------------------------------------------
Note added at 2002-10-08 07:11:32 (GMT)
--------------------------------------------------
sorry, this should\'nt have been there three times...
--------------------------------------------------
Note added at 2002-10-08 07:13:03 (GMT)
--------------------------------------------------
Abstract: The paper analyzes the problem of data cleansing and automatically identifying potential errors in data sets. An overview of the diminutive amount of existing literature concerning data cleansing is given. Methods for error detection that go beyond integrity analysis are reviewed and presented. The applicable methods include: statistical outlier detection, pattern matching, clustering, and data mining techniques. Some brief results supporting the use of such methods are given.
http://citeseer.nj.nec.com/maletic00data.html
--------------------------------------------------
Note added at 2002-10-08 06:53:39 (GMT)
--------------------------------------------------
\"data validation and filtering already at import\"
--------------------------------------------------
Note added at 2002-10-08 07:04:20 (GMT)
--------------------------------------------------
This project focuses on data cleansing, i.e., to detect and remove errors and inconsistencies in data from different sources to improve the data quality.
http://www.ics.uci.edu/~chenli/cleansing.html
--------------------------------------------------
Note added at 2002-10-08 07:05:01 (GMT)
--------------------------------------------------
Change the way you maintain your customer database. Experian Intact is the UK\'s leading Internet based data cleansing application.
http://www.experianintact.com/
--------------------------------------------------
Note added at 2002-10-08 07:05:51 (GMT)
--------------------------------------------------
Although commercial data cleansing and standardization software tools have been around for years, until fairly recently they weren\'t suitable for Web applications.
http://www.eweek.com/article2/0,,220591,00.asp?kc=EWAV10209K...
--------------------------------------------------
Note added at 2002-10-08 07:08:52 (GMT)
--------------------------------------------------
Data cleansing takes precedence
Friday 9th August 2002
http://www.it-director.com/article.php?id=3090
--------------------------------------------------
Note added at 2002-10-08 07:09:53 (GMT)
--------------------------------------------------
Data Cleansing Research Project*
--------------------------------------------------------------------------------
Summary:
This research is aimed at defining a framework for automated data cleansing. That is, given a large data set, automatically find and correct errors (semantic and syntactic) within the set. The underlying theoretical aspects of data quality research are being combined with problem solving methods from software testing, data mining, statistics, knowledge based systems, clustering, and machine learning to address this framework. The framework will define an underlying theory to support an accurate set of data quality metrics. A basic understanding of the inherent problems faced by automated data cleansing are being uncovered and investigated.
Technical Reports:
TR-CS-99-02 Progress Report on Data Cleansing 10-18-1999
TR-CS-00-02 Automated Identification of Errors in Data Sets 2-2-2000
TR-CS-00-03 Utilizing Association Rules for Identifcation of Possible Errors in Data Sets 2-28-2000
TR-CS-00-04 Utilizing Association Rules for the Data Cleansing 5-8-2000
http://www.msci.memphis.edu/~maleticj/dataclean.html
--------------------------------------------------
Note added at 2002-10-08 07:10:23 (GMT)
--------------------------------------------------
Data Cleansing Research Project*
--------------------------------------------------------------------------------
Summary:
This research is aimed at defining a framework for automated data cleansing. That is, given a large data set, automatically find and correct errors (semantic and syntactic) within the set. The underlying theoretical aspects of data quality research are being combined with problem solving methods from software testing, data mining, statistics, knowledge based systems, clustering, and machine learning to address this framework. The framework will define an underlying theory to support an accurate set of data quality metrics. A basic understanding of the inherent problems faced by automated data cleansing are being uncovered and investigated.
Technical Reports:
TR-CS-99-02 Progress Report on Data Cleansing 10-18-1999
TR-CS-00-02 Automated Identification of Errors in Data Sets 2-2-2000
TR-CS-00-03 Utilizing Association Rules for Identifcation of Possible Errors in Data Sets 2-28-2000
TR-CS-00-04 Utilizing Association Rules for the Data Cleansing 5-8-2000
http://www.msci.memphis.edu/~maleticj/dataclean.html
--------------------------------------------------
Note added at 2002-10-08 07:10:32 (GMT)
--------------------------------------------------
Data Cleansing Research Project*
--------------------------------------------------------------------------------
Summary:
This research is aimed at defining a framework for automated data cleansing. That is, given a large data set, automatically find and correct errors (semantic and syntactic) within the set. The underlying theoretical aspects of data quality research are being combined with problem solving methods from software testing, data mining, statistics, knowledge based systems, clustering, and machine learning to address this framework. The framework will define an underlying theory to support an accurate set of data quality metrics. A basic understanding of the inherent problems faced by automated data cleansing are being uncovered and investigated.
Technical Reports:
TR-CS-99-02 Progress Report on Data Cleansing 10-18-1999
TR-CS-00-02 Automated Identification of Errors in Data Sets 2-2-2000
TR-CS-00-03 Utilizing Association Rules for Identifcation of Possible Errors in Data Sets 2-28-2000
TR-CS-00-04 Utilizing Association Rules for the Data Cleansing 5-8-2000
http://www.msci.memphis.edu/~maleticj/dataclean.html
--------------------------------------------------
Note added at 2002-10-08 07:11:32 (GMT)
--------------------------------------------------
sorry, this should\'nt have been there three times...
--------------------------------------------------
Note added at 2002-10-08 07:13:03 (GMT)
--------------------------------------------------
Abstract: The paper analyzes the problem of data cleansing and automatically identifying potential errors in data sets. An overview of the diminutive amount of existing literature concerning data cleansing is given. Methods for error detection that go beyond integrity analysis are reviewed and presented. The applicable methods include: statistical outlier detection, pattern matching, clustering, and data mining techniques. Some brief results supporting the use of such methods are given.
http://citeseer.nj.nec.com/maletic00data.html
41 mins
Oooops ...
... that was an unfortunate typo in my agree to EB: I meant to say "data cleaning" is normal.
--------------------------------------------------
Note added at 2002-10-08 07:36:11 (GMT)
--------------------------------------------------
P.S. I have specified and implemented data cleaning modules (in an American bank).
--------------------------------------------------
Note added at 2002-10-08 07:36:11 (GMT)
--------------------------------------------------
P.S. I have specified and implemented data cleaning modules (in an American bank).
3 hrs
+2
7 hrs
data clean-up
not cleansing. It's not unlike 'cleaning up your files'. You don't cleanse them. 'Filtering' is is not a general "Bereinigung", but more of a 'sorting' action.
Peer comment(s):
agree |
foehnerk (X)
: data clean-up the usual term when transferring data frpom (e.g. customer records to a new system, e.g. AS400 to SAP; you delete the records that are no longer valid, or are duplicated with different abbreviations, etc .
35 mins
|
agree |
Johanna Timm, PhD
4 hrs
|
13 hrs
Data validation and clean-up on import
that's what I frequently use in DB related translations.
Something went wrong...