{"id":1972,"date":"2020-05-20T08:05:59","date_gmt":"2020-05-20T15:05:59","guid":{"rendered":"https:\/\/www.lightsondata.com\/?p=1972"},"modified":"2020-05-19T19:22:16","modified_gmt":"2020-05-20T02:22:16","slug":"data-quality-considerations-during-the-dw-bi-design-phase","status":"publish","type":"post","link":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/","title":{"rendered":"Data quality considerations during the DW\/ BI design phase"},"content":{"rendered":"<p>Decisions in today\u2019s organizations have become increasingly data-driven and real-time. Therefore, the business intelligence databases that support decision makers must be of exceptional quality.<br \/>\nWe sometimes confuse testing a data warehouse that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more complex and diverse. Nearly everything in BI applications involves the data that \u201cdrives\u201d intelligent decision making.<br \/>\nData integrity can be compromised during each DW\/ BI phase: when data is created, integrated, moved, or transformed.<br \/>\nThis article highlights strategies and best practices for catching data integrity issues during the project design phase.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Common data quality issues to be discovered during DW\/BI design<\/strong><\/h2>\n<p>A first level of testing and validation begins with the formal acceptance of the logical data model and \u201clow level design\u201d (LLD). All further testing and validation will be based on the understanding of each of the data elements in the model.<\/p>\n<p>Data elements that are created through a transformation or aggregation process must be clearly identified and calculations for each of these data elements clearly documented and easily interpreted.<\/p>\n<p>During LLD reviews and updates, special consideration should be given to typical data modeling scenarios that occur in the project. For example:<\/p>\n<ul>\n<li>Verify that many-to-many attribute relationships are clarified and resolved<\/li>\n<li>Verify the types of keys that are used: surrogate keys, natural keys, ETL generated keys<\/li>\n<li>Verify that business analysts\/DBA\u2019s review with ETL architects and developers (application), the lineage and business rules for extracting, transforming, and loading the data warehouse<\/li>\n<li>Verify that all transformation rules, summarization rules, and matching and consolidation rules have clear specifications<\/li>\n<li>Confirm that specified transformations, business rules and cleansing described in low level design (LLD) and application logic specifications meet business requirements and that they have been coded correctly in ETL, Java, or SQL used for data loads<\/li>\n<li>Verify that ETL procedures are documented to monitor and control data extraction, transformation, and loading. The procedures should describe how to handle exceptions and program failures<\/li>\n<li>Verify that data consolidation of duplicate or merged data is properly handled<\/li>\n<li>Verify that samplings of domain transformations will be utilized to confirm they are properly changed<\/li>\n<li>Ensure unique values exist for primary and foreign key fields between the source data and the data loaded to the warehouse<\/li>\n<li>Validate that target data types are as specified in the design and\/or the data model<\/li>\n<li>Verify that data field types and formats are specified and implemented<\/li>\n<li>Verify that default values are specified for fields where needed<\/li>\n<li>Verify that processing for invalid field values in the source are defined<\/li>\n<li>Verify that expected ranges of field values are specified<\/li>\n<li>Verify that all keys generated by the ETL \u201csequence generator\u201d are identified<\/li>\n<li>Verify that slowly-changing dimensions (SCD\u2019s) are described<\/li>\n<\/ul>\n<p><strong>\u00a0<\/strong><\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p>Data warehouse testing is frequently deferred until late in the project life-cycle. If testing is shortchanged (e.g., due to schedule overruns or limited resource availability), there\u2019s a high risk that critical data integrity issues will slip through the verification efforts. Even if thorough testing is performed, it\u2019s difficult and costly to address most data integrity issues exposed by this late-cycle testing.<\/p>\n<p>When testing during a late DW\/BI life-cycle phase, the cause of errors can be anything from data quality issues occurring when the data enters the data warehouse, to a data processing issue caused by failures of the business logic along layers of data warehouse loading and its BI reporting components.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Decisions in today\u2019s organizations have become increasingly data-driven and real-time. Therefore, the business intelligence databases that support decision makers must be of exceptional quality. We sometimes confuse testing a data warehouse that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more [&hellip;]<\/p>\n","protected":false},"author":12,"featured_media":1973,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[4],"tags":[54,53],"class_list":["post-1972","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-quality","tag-bi","tag-dw","post-wrapper","thrv_wrapper"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Data quality considerations during the DW\/ BI design phase | LightsOnData<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data quality considerations during the DW\/ BI design phase | LightsOnData\" \/>\n<meta property=\"og:description\" content=\"Decisions in today\u2019s organizations have become increasingly data-driven and real-time. Therefore, the business intelligence databases that support decision makers must be of exceptional quality. We sometimes confuse testing a data warehouse that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/\" \/>\n<meta property=\"og:site_name\" content=\"LightsOnData\" \/>\n<meta property=\"article:published_time\" content=\"2020-05-20T15:05:59+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i1.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"450\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Wayne Yaddow\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@georgefirican\" \/>\n<meta name=\"twitter:site\" content=\"@georgefirican\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Wayne Yaddow\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/\",\"url\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/\",\"name\":\"Data quality considerations during the DW\/ BI design phase | LightsOnData\",\"isPartOf\":{\"@id\":\"https:\/\/www.lightsondata.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1\",\"datePublished\":\"2020-05-20T15:05:59+00:00\",\"author\":{\"@id\":\"https:\/\/www.lightsondata.com\/#\/schema\/person\/4503a79021fcf6acf4850c36356b6ffe\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1\",\"width\":800,\"height\":450,\"caption\":\"Data quality considerations during the DW BI design phase\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.lightsondata.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data quality considerations during the DW\/ BI design phase\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.lightsondata.com\/#website\",\"url\":\"https:\/\/www.lightsondata.com\/\",\"name\":\"LightsOnData\",\"description\":\"Practical resources, online courses, free articles and videos for data management, data governance, data quality, and business intelligence\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.lightsondata.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.lightsondata.com\/#\/schema\/person\/4503a79021fcf6acf4850c36356b6ffe\",\"name\":\"Wayne Yaddow\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.lightsondata.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/0c5eaab6104044bec265d829746d99a451bd1acb77e74af42b55c757e191fe76?s=96&d=retro&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/0c5eaab6104044bec265d829746d99a451bd1acb77e74af42b55c757e191fe76?s=96&d=retro&r=g\",\"caption\":\"Wayne Yaddow\"},\"description\":\"Wayne Yaddow is an independent consultant with more than 20 years\u2019 experience leading data integration, data warehouse, and ETL testing projects with J.P. Morgan Chase, Credit Suisse, Standard and Poor\u2019s, AIG, Oppenheimer Funds, and IBM. He taught IIST (International Institute of Software Testing) courses on data warehouse and ETL testing and wrote DW\/BI articles for Better Software, The Data Warehouse Institute (TDWI), Tricentis, and others. Wayne continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.\",\"url\":\"https:\/\/www.lightsondata.com\/author\/wyaddowgmail-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data quality considerations during the DW\/ BI design phase | LightsOnData","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/","og_locale":"en_US","og_type":"article","og_title":"Data quality considerations during the DW\/ BI design phase | LightsOnData","og_description":"Decisions in today\u2019s organizations have become increasingly data-driven and real-time. Therefore, the business intelligence databases that support decision makers must be of exceptional quality. We sometimes confuse testing a data warehouse that produce business intelligence (BI) reports with backend or database testing or with testing the BI reports themselves. Data warehouse testing is much more [&hellip;]","og_url":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/","og_site_name":"LightsOnData","article_published_time":"2020-05-20T15:05:59+00:00","og_image":[{"width":800,"height":450,"url":"https:\/\/i1.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1","type":"image\/jpeg"}],"author":"Wayne Yaddow","twitter_card":"summary_large_image","twitter_creator":"@georgefirican","twitter_site":"@georgefirican","twitter_misc":{"Written by":"Wayne Yaddow","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/","url":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/","name":"Data quality considerations during the DW\/ BI design phase | LightsOnData","isPartOf":{"@id":"https:\/\/www.lightsondata.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#primaryimage"},"image":{"@id":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1","datePublished":"2020-05-20T15:05:59+00:00","author":{"@id":"https:\/\/www.lightsondata.com\/#\/schema\/person\/4503a79021fcf6acf4850c36356b6ffe"},"breadcrumb":{"@id":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#primaryimage","url":"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1","contentUrl":"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1","width":800,"height":450,"caption":"Data quality considerations during the DW BI design phase"},{"@type":"BreadcrumbList","@id":"https:\/\/www.lightsondata.com\/data-quality-considerations-during-the-dw-bi-design-phase\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.lightsondata.com\/"},{"@type":"ListItem","position":2,"name":"Data quality considerations during the DW\/ BI design phase"}]},{"@type":"WebSite","@id":"https:\/\/www.lightsondata.com\/#website","url":"https:\/\/www.lightsondata.com\/","name":"LightsOnData","description":"Practical resources, online courses, free articles and videos for data management, data governance, data quality, and business intelligence","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.lightsondata.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.lightsondata.com\/#\/schema\/person\/4503a79021fcf6acf4850c36356b6ffe","name":"Wayne Yaddow","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.lightsondata.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/0c5eaab6104044bec265d829746d99a451bd1acb77e74af42b55c757e191fe76?s=96&d=retro&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0c5eaab6104044bec265d829746d99a451bd1acb77e74af42b55c757e191fe76?s=96&d=retro&r=g","caption":"Wayne Yaddow"},"description":"Wayne Yaddow is an independent consultant with more than 20 years\u2019 experience leading data integration, data warehouse, and ETL testing projects with J.P. Morgan Chase, Credit Suisse, Standard and Poor\u2019s, AIG, Oppenheimer Funds, and IBM. He taught IIST (International Institute of Software Testing) courses on data warehouse and ETL testing and wrote DW\/BI articles for Better Software, The Data Warehouse Institute (TDWI), Tricentis, and others. Wayne continues to lead numerous ETL testing and coaching projects on a consulting basis. You can contact him at wyaddow@gmail.com.","url":"https:\/\/www.lightsondata.com\/author\/wyaddowgmail-com\/"}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/www.lightsondata.com\/wp-content\/uploads\/2020\/05\/Data-quality-considerations-during-the-DW-BI-design-phase.jpg?fit=800%2C450&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p9BPV6-vO","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/posts\/1972","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/comments?post=1972"}],"version-history":[{"count":1,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/posts\/1972\/revisions"}],"predecessor-version":[{"id":1974,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/posts\/1972\/revisions\/1974"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/media\/1973"}],"wp:attachment":[{"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/media?parent=1972"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/categories?post=1972"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lightsondata.com\/wp-json\/wp\/v2\/tags?post=1972"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}