Automated Data Warehouse Testing

March 4th, 2010 | Author: Ken Collier

Remember that our goal in agile BI development is the frequent release of production quality working software for user feedback and acceptance. At the end of each iteration or sprint, and each release our working product is expected to be of shippable quality, even in its most embryonic stages. This objective requires an entirely different approach to our quality assurance methods. Foremost it means integrating QA efforts right into our iterations.

Traditional BI development methods push system and acceptance testing to the end of the project cycle. This backend testing is typically manually intensive, possibly supplemented by the use of semi-automated tools. We need an entirely different testing discipline for Agile BI development.

First and foremost testing is integrated into the development process. Each development iteration must include plans for QA activities. One of the great things about this is that bugs don’t get a chance to accumulate over the project lifecycle. Quality feedback is immediate and frequent, and bugs are handled as they arise. QA specialists become integral members of the development team rather than gatekeepers at the end of the development cycle. Developers become integral to the QA process and learn sound testing practices as an extension of their technical skills. When I first introduce this notion to new agile teams I often get pushback from developers who say things like, “I don’t have time to test.” Or, “Testing is not my job.” I generally quell the urge to say something like, “If building a high quality BI system is not your job, then what exactly is your job?” Once developers establish the rhythm of making testing an integral part of development they usually love it. I’ve had a number of BI developers wonder why they didn’t learn to integrate testing and development long ago.

Essential to integrated testing is test automation. Manual testing is just not practical in a highly iterative and adaptive development environment. There are two key problems with manual testing. First, it takes too damn long and is an inhibitor to the delivery of frequent working software. Teams that rely on manual testing ultimately end up deferring testing until dedicated testing periods, which allows bugs to accumulate. Second, it is not sufficiently repeatable for regression testing. While we seek to embrace and adapt to change, we must always be confident that features that were “Done, done!” in a previous iteration retain their high quality in light of the changing system around them. Test automation requires some initial effort and ongoing diligence, but once technical teams get the hang of it, they can’t live without it.

Teams that do not practice integrated, automated testing are not really agile. It just isn’t feasible to create production quality, working features for user acceptance every 1-3 weeks without integrated and automated testing.

And by the way, quality assurance must be more comprehensive than system level testing. I’m always surprised at BI teams who treat final system testing as the only testing that is required in BI development. Agile BI developers test every unit of code, every integration point, every data structure, every user feature, and ultimately the entire working system not matter how embryonic. Unit testing involves testing the lowest level components that make up the BI system such as SQL scripts, ETL modules, stored procedures, etc. Integration testing involves testing all of the data transition points and wherever commercial tools are receiving or returning data. As data is pumped from source systems into staging databases; or from staging into multidimensional databases or OLAP engines, each data structure along the dataflow path must be tested to ensure that data integrity is preserved or enhanced. Simple mistakes like copying a VARCHAR(50) value into a VARCHAR(30) field in the staging database can wreak havoc on data integrity. Finally, each newly developed feature must be tested for acceptance and accuracy. Does it do what the user wants, needs, and expects; and does it do it correctly? While this is the ultimate acid test, we need confidence that our system is behaving well throughout the process flow.

The benefits of integrated, automated testing are even further boosting by using test-driven BI development. TDD is as much an implementation practice as it is a testing practice. In this approach test cases are written first, and then the code (or script, or configuration, etc.) is written to pass those test cases. When the system passes all of the test cases, and the BI practitioners can’t think of any new test cases, then the implementation work is “done”. That is it works as the developers think it should, and is of production quality. It is now ready for user acceptance to consider it “done, done”. While test-driven development may not be as mandatory as test automation and test integration, this is a technical practice that yields tremendous benefits since testing and development are inextricably linked. The test suite grows alongside the system, and since testing is automated the suite can be rerun frequently to maintain a high level of confidence in BI product quality.

TDD is applied most prominently at the lowest component level, or unit level, which ensures that high quality exists in the building blocks that make up the system. Story test driven development will help ensure that the user acceptance criteria are clearly defined for each user story before development begins.

Integrated automated testing in database development presents a unique set of challenges. Current automated testing tools designed for software development are not easily adaptable to database development, and large data volumes can make automated testing a daunting task. Data warehouse architectures further complicate these challenges since they involve multiple databases (staging, presentation, and sometimes even pre-staging); special code for data extraction, transformation, and loading (ETL); data cleansing code; and reporting engines and applications.

Posted in Agile Business Intelligence | Tags: database testing, storytest driven data warehousing, test driven database development

10 Responses to “Automated Data Warehouse Testing”

Bill Hayduk:

September 21, 2011 at 2:35 am

I agree that the statements “integrated automated testing in database development presents a unique set of challenges. Current automated testing tools designed for software development are not easily adaptable to database development, and large data volumes can make automated testing a daunting task”.

We tried QTP from HP and RFT from IBM. Launching the SQL to compare data between each leg is straight-forward enough, but doing the compares is a daunting task. We now use QuerySurge, which we built specifically for this purpose.
HenryM:

January 22, 2012 at 8:24 am

thanks for that good article. Am I there for right to assume that the bulk of automated testing in BI/Data warehouse/ETL development, is at the development side – ie. automated unit tests as described above..?

I’m going for an interview tomorrow for a BI/ETL test manager role and the client wants someone with knowledge of automated testing around BI/Datawarehouses, ETL etc.. The only software I’ve been able to find is QuerySurge – is this indeed the only tool for automating this kind of testing?
Ken Collier:

January 22, 2012 at 12:22 pm

Thanks for the feedback. Automated database & warehouse testing is not exclusively unit testing, but it begins there. Two things seem to be true. First, if you keep your “code” units (e.g., ETL modules, SQL scripts, stored procedures, etc.) very small, independent, and single purposed (high cohesion/low coupling) then it becomes very easy to write all the tests you need for each unit. Second, if you write tests as you develop (favoring test-driven development), then before you know it you have a very nice suite of unit tests that you can also use for regression testing. It seems that when both of these practices are followed, then your test suite catches a vast majority of defects – I would speculate that about 80% of your defects are eliminated in this way.

But that’s not all – Agile DW/BI testing must be driven upstream to the requirements, or user story, level. As a user story gets scheduled for an iteration it’s time to fully specify it as a functional requirement. If you express that specification as a collection of acceptance test scenarios, then you get a very precise specification as well as an executable suite of test cases for that particular story. Acceptance scenarios can follow the form: Given [some preconditions] When [some user action is taken] Then [some expected results or postconditions].

Regarding tools – Database testing tools are lagging far behind software testing tools, but this seems to be changing. I’m hopeful that in a few years we’ll have a lot of powerful tools to choose from. For the past couple of years I’ve been using Fitnesse with the DbFit add-on developed by Gojko Adzic. DbFit enables you to express your test cases as SQL queries rather than XML result sets using tools like SQLUnit, OUnit, DbUnit, and others. All of these are open-source testing tools, which makes it very nice to easily download them and begin figuring out how to use them. QuerySurge looks interesting, but I haven’t had the opportunity to use it on a project yet.

I hope this helps you, and best wishes in your interview.
Chris Mills:

March 30, 2012 at 4:25 pm

Great article. I certainly agree that building out automated testing in a warehouse environment presents a unique set of challenges. All the hard work sure pays off, though.

At my company we’ve been pursuing a warehouse test automation strategy for a little over two years now and are seeing tremendous gains in both quality and productivity. We’ve got a very robust continuous integration environment for humming away now, and we’re taking the steps described above to get acceptance criteria for new stories described as a set of test scenarios.

I recently did a little write up on our story, which is available at the link below.

http://collectivegenius.wordpress.com/2012/03/28/continuous-integration-in-a-data-warehouse/
HenryM:

April 2, 2012 at 1:08 am

Ken I’ve just noticed that I never thanked you for your reply!

So, a bit late, but thank you! Much appreciated.

I did end up landing the job. Its a pretty pure BI/DW world. Automation seems like a nice to have, but trying to change the habits and practices of an established team to start developing in a TDD manner seems a way off yet.. Currently no formal unit testing at all..

I’m system testing and still find Excel my key tool (potentially for lack of specialist alternative), setting up a series of spreadsheets (test conditions/test data/expected results) which results in only needing to run a few queries to perform a full regression test of medium sized ETL.

I will keep watching this space with interest.

Thanks again,
Henry.
Bill C.:

July 6, 2012 at 1:34 pm

Great post. Agile really seems like an asset to the warehouse automation process and testing. If it can eliminate errors and increase efficiency, I’m all for it. I just worry about the integration of software and human intelligence.
database testing tools:

September 6, 2012 at 5:38 am

HenryM – have you tried the latest web-based database testing tools from Gedis Studio?
sluksic:

September 21, 2012 at 12:17 am

Ken,
can we use, for example, some Data Quality tool for testing data warehouse?
In those tools, there are samo pre-defined qualitiy checks, like is it a valid e-mail address. But, usualy you can write your own check code, sql…
What is your opinion?
Thanks
Ken Collier:

September 24, 2012 at 10:59 am

I think there is a difference between data quality evaluation and testing to ensure that your DW processes ensure that data is integrated and transformed in complete, consistent, and correct ways. So, I’m having a bit of difficulty imagining how a data quality tool will provide sufficient testing of your ETL processes and other code. For example your data quality tool may tell you that the data contains valid e-mail addresses, but can it tell you that these are the exact same set of e-mail addresses that are present in the source data? However, I would encourage you to experiment further with this idea and see if there is a sensible way to accomplish the goal of test automation using a data quality tool. Best wishes and let us all know how it goes.
Andrew Murphy:

December 15, 2014 at 4:53 pm

Hi Ken

Was wondering if you could share a bit more about your experiences in DW testing with DBFit. I have got your book and have managed to modify your code to run 2012 packages ok, but am wondering if it would make sense to modify the package using a parameter to run in “test case” mode. This mode would point data sources to predefined test tables that can be setup/populated through dbfit.

My only concern is that for this kind of operation, you would need to commit the data to the tables for the SSIS package to pickup the data on execution.

Would this go against your principles for testing?

Furthermore, do you know how the dbfit tests can be automated?

Your thoughts/comments would be appreciated.

Automated Data Warehouse Testing

10 Responses to “Automated Data Warehouse Testing”

Leave a Reply