データ品質

関: experimental design、matched group、measurement、research design、scoring method、testing

WordNet

the act or process of assigning numbers to phenomena according to a rule; "the measurements were carefully done"; "his mental measurings proved remarkably accurate" (同)measuring, measure, mensuration
an examination of the characteristics of something; "there are laboratories for commercial testing"; "it involved testing thousands of children for smallpox"
the act of subjecting to experimental test in order to determine how well something works; "they agreed to end the testing of atomic weapons"
of high social status; "people of quality"; "a quality family"
high social status; "a man of quality"
a degree or grade of excellence or worth; "the quality of students has risen"; "an executive of low caliber" (同)caliber, calibre
an essential and distinguishing attribute of something or someone; "the quality of mercy is not strained"--Shakespeare
a characteristic property that defines the apparent individual nature of something; "each town has a quality all its own"; "the radical character of our demands" (同)character, lineament
a collection of facts from which conclusions may be drawn; "statistical data" (同)information
use of chemical analysis to estimate the age of geological specimens (同)geological dating

PrepTutorEJDIC

〈U〉『測定』,測量 / 〈C〉《複数形で》『寸法』 / 〈U〉測定法,測量法
〈C〉(人・物の)『特質』,『特性』《+of+名》 / 〈U〉(…の)『本質』(nature)《+『of』+『名』》 / 〈U〉『質』,品質 / 〈U〉良質(excellence),優秀性(superiority) / 〈U〉高い身分
『資料』,事実;情報

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2013/09/18 11:09:38」(JST)

wiki en

[Wiki en表示]

Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" (J. M. Juran). Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes paramount, regardless of fitness for use for any external purpose, e.g. a person's age and birth date may conflict within different parts of a database. The first views can often be in disagreement, even about the same set of data used for the same purpose. This article discusses the concept as it related to business data processing, although of course other data have various quality issues as well.

Definitions[edit source | edit]

This list is taken from the online book "Data Quality: High-impact Strategies".^[1] See also the Glossary of data quality terms ^[2]

Degree of excellence exhibited by the data in relation to the portrayal of the actual scenario.
The state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific use.^[3]
The totality of features and characteristics of data that bears on their ability to satisfy a given purpose; the sum of the degrees of excellence for factors related to data.^[4]
The processes and technologies involved in ensuring the conformance of data values to business requirements and acceptance criteria.^[5]
Complete, standards based, consistent, accurate and time stamped.^[6]

History[edit source | edit]

Before the rise of the inexpensive server, massive mainframe computers were used to maintain name and address data so that the mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars compared to manually correcting customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available.

Companies with an emphasis on marketing often focus their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found in the enterprise. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) improving the understanding of vendor purchases to negotiate volume discounts; and 3) avoiding logistics costs in stocking and shipping parts across a large organization.

While name and address data has a clear standard as defined by local postal authorities, other types of data have few recognized standards. There is a movement in the industry today to standardize certain non-address data. The non-profit group GS1 is among the groups spearheading this movement.

For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of the data, cross tabulation, modeling and outlier detection, verifying data integrity, etc.

Overview[edit source | edit]

There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996).

A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. These lists commonly include accuracy, correctness, currency, completeness and relevance. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognise this as a similar problem to "ilities".

MIT has a Total Data Quality Management program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991).

In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the US economy of data quality problems at over US$600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects.^[7]

In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed.^[8]

One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year.^[9]

In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some^[who?] organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations.

Problems with data quality don't only arise from incorrect data. Inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency.

Enterprises, scientists, and researchers are starting to participate within data curation communities to improve the quality of their common data.^[10]

The market is going some way to providing data quality assurance. A number of vendors make tools for analysing and repairing poor quality data in situ, service providers can clean the data on a contract basis and consultants can advise on fixing processes or systems to avoid data quality problems in the first place. Most data quality tools offer a series of tools for improving data, which may include some or all of the following:

Data profiling - initially assessing the data to understand its quality challenges
Data standardization - a business rules engine that ensures that data conforms to quality rules
Geocoding - for name and address data. Corrects data to US and Worldwide postal standards
Matching or Linking - a way to compare data so that similar, but slightly different records can be aligned. Matching may use "fuzzy logic" to find duplicates in the data. It often recognizes that 'Bob' and 'Robert' may be the same individual. It might be able to manage 'householding', or finding links between husband and wife at the same address, for example. Finally, it often can build a 'best of breed' record, taking the best components from multiple data sources and building a single super-record.
Monitoring - keeping track of data quality over time and reporting variations in the quality of data. Software can also auto-correct the variations based on pre-defined business rules.
Batch and Real time - Once the data is initially cleansed (batch), companies often want to build the processes into enterprise applications to keep it clean.

There are several well-known authors and self-styled experts, with Larry English perhaps the most popular guru. In addition, the International Association for Information and Data Quality (IAIDQ) was established in 2004 to provide a focal point for professionals and researchers in this field.

ISO 8000 is the international standard for data quality.

Criticism of existing tools and processes[edit source | edit]

The main reasons cited are:

Project costs: costs typically in the hundreds of thousands of dollars
Time: lack of enough time to deal with large-scale data-cleansing software
Security: concerns over sharing information, giving an application access across systems, and effects on legacy systems

Professional associations[edit source | edit]

International Association for Information and Data Quality (IAIDQ)

References[edit source | edit]

^ "Data Quality: High-impact Strategies - What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors". Retrieved 5 February 2013.
^ Glossary of data quality terms published by IAIDQ
^ Government of British Columbia
^ REFERENCE-QUALITY WATER SAMPLE DATA: NOTES ON ACQUISITION, RECORD KEEPING, AND EVALUATION
^ istabg.org Data QualYtI – Do You Trust Your Data?
^ GS1.ORG dqf
^ http://www.information-management.com/issues/20060801/1060128-1.html
^ http://www.directionsmag.com/article.php?article_id=509
^ http://ribbs.usps.gov/move_update/documents/tech_guides/PUB363.pdf
^ E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.

UpToDate Contents

全文を閲覧するには購読必要です。 To read the full text you will need to subscribe.

1. 血液透析における水質の維持 maintaining water quality for hemodialysis
2. 米国の病院における医療の質の測定 measuring quality in hospitals in the united states
3. 頭頸部癌におけるQOL quality of life in head and neck cancer
4. 癌生存者用ケアの質の保証：癌と共生するためのケアプラン assuring quality of care for cancer survivors the survivorship care plan
5. 冠動脈疾患および心不全におけるガイドラインの遵守および転帰 guideline adherence and outcomes in coronary heart disease and heart failure

English Journal

Sound level intensity severely disrupts sleep in ventilated ICU patients throughout a 24-h period: a preliminary 24-h study of sleep stages and associated sound levels.

Elbaz M1,2, Léger D3,4, Sauvet F2,5, Champigneulle B6, Rio S1,2, Strauss M1,7,8, Chennaoui M2,5, Guilleminault C9, Mira JP6.
Annals of intensive care.Ann Intensive Care.2017 Dec;7(1):25. doi: 10.1186/s13613-017-0248-7. Epub 2017 Mar 3.
PMID 28255956

The clinical benefits of denosumab for prophylaxis of steroid-induced osteoporosis in patients with pulmonary disease.

Ishiguro S1, Ito K2, Nakagawa S3, Hataji O2, Sudo A4.
Archives of osteoporosis.Arch Osteoporos.2017 Dec;12(1):44. doi: 10.1007/s11657-017-0336-1. Epub 2017 Apr 19.
PMID 28425086

Impact of air pollution on vitamin D deficiency and bone health in adolescents.

Feizabad E1, Hossein-Nezhad A2,3, Maghbooli Z4, Ramezani M4, Hashemian R5, Moattari S6.
Archives of osteoporosis.Arch Osteoporos.2017 Dec;12(1):34. doi: 10.1007/s11657-017-0323-6. Epub 2017 Apr 5.
PMID 28378273

Japanese Journal

対話者間の音声特徴類似度と対話の情報伝達効果の関係

陳伯翰,北岡教英,武田一哉
情報処理学会研究報告. SLP, 音声言語情報処理 2014-SLP-104(27), 1-6, 2014-12-08
本稿では,対話者間の音声特徴類似度と対話の情報伝達効果の関係を調査する.そのため HCRC マップタスクのデータを利用する.韻律特徴については,GMM と bigram を用いて話者の韻律特徴をモデル化して類似度を測る.言語特徴について,bigram モデルとキーワードの相対頻度を用いて対話者間の言語類似度を測る.結果として,対話者間の韻律特徴類似度と対話の情報伝達の効果の間に,正の相関があった. …
NAID 110009850971

Multiple Non-negative Matrix Factorization を用いた多対一声質変換

相原龍,滝口哲也,有木康雄
情報処理学会研究報告. SLP, 音声言語情報処理 2014-SLP-104(15), 1-6, 2014-12-08
本報告では,非負値行列因子分解 (NMF) を拡張したMultiple Non-negative Matrix Factorization (Multi-NMF) を提案し,任意話者の発話を特定話者の発話へと変換する多対一声質変換を行う.従来,声質変換は入力話者の声質を出力話者のものへ変換する話者変換を目的として広く研究されてきた.声質変換において最も一般的な手法は混合正規分布モデル (GMM) …
NAID 110009850921

ドメインテスト技法に基づく網羅的なテストデータ自動生成手法の提案

丹野治門 ,張暁晶
情報処理学会研究報告. ソフトウェア工学研究会報告 2014-SE-186(6), 1-8, 2014-11-06
本研究では,業務システムにおける画面の入力バリデーションやビジネスロジックのテストにおいて,同値分割と境界値分析に基づいて網羅的にテストデータの自動生成することで,テストデータ作成のコスト削減とテスト品質確保を狙う.既存技術では,画面から入力される変数同士や,ビジネスロジックで扱う変数同士に依存関係がある場合,適切なテストデータを自動生成することができず,現実的なアプリケーションにおいて,特にテス …
NAID 110009840464

「testing」

　　[★]

n.

試験

関: assessment、data quality、exam、examination、examine、experimental design、matched group、measurement、research design、scoring method、test、trial

「measurement」

　　[★]

測定、計測、測定値

関: data quality、determine、estimate、experimental design、fathom、matched group、measure、research design、scoring method、testing

「scoring method」

　　[★]

スコアリング法、スコア化法、点数化法

関: data quality、experimental design、matched group、measurement、research design、testing

「experimental design」

　　[★]

実験デザイン、実験計画

関: data quality、matched group、measurement、research design、scoring method、testing

「research design」

　　[★]

研究デザイン

関: data quality、experimental design、matched group、measurement、scoring method、testing

「dating」

　　[★]

n.

デートすること。日付記入、年代決定、年代測定。価格割引適用日数、代金支払猶予日数

「quality」

　　[★]

n.

質、品質

関: substantia

「data」

　　[★]

データ、資料

関: datum

[1] "Data Quality: High-impact Strategies - What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors". Retrieved 5 February 2013.

[2] Glossary of data quality terms published by IAIDQ

[3] Government of British Columbia

[4] REFERENCE-QUALITY WATER SAMPLE DATA: NOTES ON ACQUISITION, RECORD KEEPING, AND EVALUATION

[5] stabg.org Data QualYtI – Do You Trust Your Data?

[6] GS1.ORG dqf

[7] ttp://www.information-management.com/issues/20060801/1060128-1.html

[8] ttp://www.directionsmag.com/article.php?article_id=509

[9] ttp://ribbs.usps.gov/move_update/documents/tech_guides/PUB363.pdf

[10] E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47.

リンク元	「testing」「measurement」「scoring method」「experimental design」「research design」
関連記事	「dating」「quality」「data」

匿名

検索

案内

案内

data quality