A Case Study on Data Quality: Find & Clean Skunk Data

Too often I see spreadsheets that have models built, complex formulas, etc. The user will come to me and ask for help with automation. But I often discover data quality issues that cause us to back up and assess everything.

I was helping someone with his project. Lots of data in multiple workbooks, summary pages, hidden sheets everywhere, fine details … just wild!

I slowed things down so that we could check the data quality. WOAH! That’s when my client observed how people were making additional rows of data to make amplifying details. Example:

Row1: Food/Catering
Row2: Coffee never arrived

Our goal was to count the Food/Catering problem. But, we have TWO rows … DOUBLING the number of problems that we really have. We don’t want that detail in Row2.

This video summarizes the situation. We look at the data using pivot tables. We then use tables and a Power Query inner join to merge the tables and isolate the data that we want to keep.

We don’t spend a lot of time on the Excel stuff. The focus here is on the importance of knowing your data and keeping data quality on your mind at all times.

For an intro to Get & Transform (Power Query) try my Lynda/LinkedIn course:
https://www.linkedin.com/learning/instructors/oz-du-soleil?trk=insiders_28299411_learning
Website: https://ozdusoleil.com

My book: Guerrilla Data Analysis 2nd Edition
http://www.amazon.com/Guerrilla-Analysis-Using-Microsoft-Excel/dp/1615470336
My old blog: http://datascopic.net/blog-2-2