Web services performance testing: a pilgrim’s progress. Part 3

This particular blog series is probably going to take me as long to finish as it did the medieval Muslim residents of al-Andalus to make a Hajj. So the “pilgrim” in this blog series title is apt.

It’s easy enough to set up a “load test” in a testing tool. It’s a little more challenging to frame the questions you want to ask about performance.

The services whose performance I was testing request data on policies that vary by insured risk state. The policy data resides in an IBM z/OS mainframe system and Datacom databases. The architecture works something like this:

  • .NET Web service request for policy data is made
  • Request is routed through middleware Web services that scrape mainframe screens or query Datacom
  • Middleware Web service returns policy data, or a SOAP fault, to .NET
  • .NET passes back the data to the requestor as XML

The team was especially concerned about the performance of the components that scraped the screens. Screen scraping can be slow, and our code would be sharing the subsystem that scrapes the data with a finicky Java messaging framework. Also, the middleware in question is very much due for an upgrade.

After thinking about these issues as well as some consultation with the project team, I designed my performance tests to record:

  • Response time in milliseconds per request
  • The risk state of the policy data being requested
  • The number of policy terms in the response: the usual number is two but the minimum number is one when the request is successful. If two terms are present, the number of screens that needs to be scraped doubles.
  • The size in bytes of the response. It is possible that a single-term response could be as large or larger than a two-term response in some cases, depending on the amount of data per mainframe screen and whether certain screens were used for that policy.

I made a couple more decisions based on the fact that I was testing in production (see part 1 of this series).

  • Requests would be made every 30 to 60 seconds over a period of a couple of hours for a total of 200 or so. Earlier tests at a higher frequency did not always go well (the aforementioned Java messaging framework was a rather feeble canary in the coal mine).
  • The team felt that the volume of requests (one or two a minute) was a realistic prediction of actual production load. The static frequency is not realistic, but my concern was again to avoid interfering with other production usage.
  • I felt that a sample size of 200 was “decent.” Since prod support staff had to monitor the production systems as I ran the test, a run time of anything over a couple of hours would not have been reasonable.

In the next posts I’ll review how I recorded and reported on data using SoapUI, Groovy, and the R statistics language.


Easy peasy emails from SoapUI test cases to you

As a sophomore-level programming autodidact, I’m on an ongoing quest to bootstrap my test automation with SmartBear‘s venerable SoapUI. You can script SoapUI as heavily as you want to with the Groovy programming language. Or you can use SoapUI’s built-in GUI elements to reduce your programming work. For example, it can be a simple matter to let SoapUI consume your project WSDL and build out a request for you… and then display the incoming response in an easy-to-read form. Or you can use groovy-wslite to start you off on the same road, but it may take you longer to write the code yourself and might not yield you any richer results.

I really wanted SoapUI to email me a simple text message when a test ended. I’d already written and tested a Groovy class that used Apache Commons’ multipart email capabilities. However, I wasn’t sure how to use that class in SoapUI. After some Googling and experimenting, here’s how I got the whole thing working today.

  1. I pointed SoapUI’s script library to the folder that contained my .groovy file with the class definition.* The sixth entry on the right in the image below takes a folder location.
    Inline image 1
  2. I dropped the Apache commons.email jar into $SoapUI_home/bin/ext, otherwise known as the bin/ext directory in your SoapUI installation folder.  (Hat tip to Saurabh Gupta for this pointer.) I would imagine that putting it into $SoapUI_home/lib would work just as well, since that’s where the SoapUI installer puts a lot of the other Apache libraries.
  3. For this test case, I wanted to record all my pass/fails and email myself at the end. So I put the following code into the setup script for the test case. The setup script window is visible at the bottom of the test case GUI in SoapUI.
    context.scriptResultsList = [] // List to hold pass/fails
    context.email // to be initialized later
  4. Later in the test case, a Groovy script checks one XML file against another and records “PASS” if they’re identical, “FAIL” if not, and adds the “PASS” or “FAIL” string as a list item to the scriptResultsList context variable defined in the test case setup script.
    if (xmlDiff.identical()) {
       scriptResult = 'PASS'  
     else {
       scriptResult = 'FAIL'
    context.scriptResultsList << scriptResult
  5. In the teardown script for the test case, I call my email class by attaching it to the context.email variable I created in the setup script. I send an email whose text depends on whether any of my test cases failed. I could attach a results file with a little more work.
    if ( context.scriptResultsList.find {it == 'FAIL'} ) {
    context.email = new ApacheMultiPartEmail("", "", "", 
    "Failure: SoapUI Regression Test", "At least one of your 
    test runs failed. Check detailed results.", 
    else {
    context.email = new ApacheMultiPartEmail("", "", "", 
    "Pass: SoapUI Regression Test", 
    "None of your test runs failed.", 

* Here’s my email class definition, which closely resembles the example in the Apache Commons online docs. It was written to send emails via an Exchange SMTP server. Note that Apache Commons also offers a SimpleEmail class that would have worked just as well for this limited purpose.

import org.apache.commons.*;
public class ApacheMultiPartEmail {
public ApacheMultiPartEmail(attPath, attDescription, attName, msgSubject, msgMessage, msgRecipient) {
if (attPath != '' && attDescription != '' && attName != '') {
EmailAttachment attachment = new EmailAttachment();

Email email = new MultiPartEmail();
email.setHostName("smtp.yourhostname.org"); // I hardcode this
email.setSmtpPort(yoursmtpport); // also hardcoded in the class definition
email.setFrom("desiredSenderEmailAddress"); //I hardcode this value in the class always to send the message from me. You could pass it in as a parameter too. 


Big data technology for absolute beginners: a meetup report

After years of reading-intensive formal education, I’ve come to the conclusion that I’m actually best at hands-on learning. You can talk to me till next year about technical concepts but until I can see them in action, they often don’t make much sense to me. That’s why this morning’s Boston Data Mining meetup was so valuable: we worked directly with an Amazon Web Services Elastic MapReduce cluster and an AWS Redshift database. Once you can make those connections, the learning opportunities are probably boundless.

The good people of Data Kitchen (Gil Benghiat, Eric Estabrooks, and Chris Bergh) first gave us high-level overviews of Hadoop, AWS EMR (Amazon’s branding of Hadoop), Redshift, and associated frameworks like Impala. Also covered at a high level were MapReduceHive and Pig, which you can use to retrieve data from a Hadoop/AWS EMR cluster. Each technology has its strengths and weaknesses, and the DK guys gave some expert advice in those areas too. Questions from the 50+ people in the room were of high quality and brought up some good discussion points. Also on hand with deep subject matter expertise and critical helpful hints for newbs like me was William Lee of Imagios.

Before long it was time to try connecting to an AWS EMR cluster. Setup for these connections is not a trivial matter, but fortunately there were good instructions posted on the DK blog before the meetup convened. Amazingly, even though many people arrived at the meetup without having completed the setup prerequisites, and all three major desktop O/S were well represented in the crowd, most people were able to run a SQL query against an AWS EMR cluster by the end of the morning. Yes, even me. (My biggest challenge was trying to get SQL Workbench to run on Debian without gnome or KDE installed. FOSS and I have a Stockholm syndrome type of relationship. Long story short, make sure you have one of those desktop environments installed before you try to use SQL Workbench on Linux.)

The Data Kitchen guys and William Lee also put in a few extra hours to make sure we all could put together a Redshift database to which we could connect from our desktops. I was flabbergasted that I was able to get up and running with the AWS technologies for a couple of cents on the dollar. Last week I enrolled in an online Hadoop course that promised I could run labs in the cloud, only to find out after the first couple of lectures that there was no cloud and that the desktop software I would need required a six-core processor at a minimum. Needless to say, I quickly unenrolled from the course.

You can create an AWS EMR cluster that costs a few cents an hour to run. The configuration options you choose during cluster creation apparently can affect the price greatly, so be careful. (The Data Kitchen slides provided specific information on this point.) Also critical: if you’re not going to keep using the EMR cluster or Redshift database you create, remember to terminate it (EMR) or shut it down (Redshift), or face a big credit card bill later. Another great thing for us cheapskates: public big data is only a Google away.

Slides from the workshop will be posted to Slideshare – I would imagine that Data Kitchen will announce the postings on their blog. All told, this was a morning well spent. My lunchtime visit to Tatte Bakery on 3rd Street didn’t hurt.

Web services performance testing: a pilgrim’s progress, part 2

Here is where I vent some frustration. I’m self-taught on all the tools I use and I am the only person in my shop who knows how to use them in this manner. What’s more, I am the only Java/Groovy person right now at my shop. Being an autodidact can make one feel nice and self-sufficient, but the (large) holes in my knowledge are starting to cause me trouble. Why does the tool I use appear to do something very different from what I expect after reading the documentation? How do I instrument the tool to put load on a Web site through the UI, or should I be using a different tool?

Fortunately there’s been a shift in leadership at my workplace so that if I need to make the case for training, I’m more likely to be heard. So I’ll probably be doing that shortly. That being said, if the training doesn’t exist, I’m not sure what I will do. Is there training for JMeter or do developers just pick it up on their own? And if you don’t work in a Java shop, what do you do? Should I start looking into LoadRunner? I suppose I could put this question out to sqa.stackexchange or Software Testing Club and see what I come back with.

The testing community needs good TESTING tools – which are not necessarily automation tools – and reliable documentation, training, and support on those tools. There is DEFINITELY a market for this.

Also: I need the support of prod support/sysadmin types who, frankly, have better things to do than monitor a service in test for hours a day.  After several stress/performance tests have yielded conflicting results, my best bet would probably be to learn how to use the monitors they are using and go conduct some tests on my own time.

Finally: it’s an education to watch how quickly other people and I can fall into the confirmation bias trap. Example: I ran a low volume performance test against the production region on Friday, one I’ve run several times before apparently without incident. Sure enough, the system that is the most likely to have trouble with the load I’m creating started going haywire during the test. It continued to do so after I shut down the script, which makes me start wondering if my tool’s UI is telling me what I need to know. (A tool that monitors incoming requests against the system under load would be helpful here.)

Right away, people (including me) wanted to ascribe the problems with the system in production to the load against the system under test.  However, there’s really no proof to support that unless it happens repeatedly. And repeated production failures are not something I would wish on anyone. Sadly, our test system is not an exact replica of our production system – shared resources need to be allocated first to prod – so I’m not sure what performance testing on the test system will tell me.

So I have some new goals today:

  • Get a monitor that does a different view of the system under test (incoming requests) and learn how to use that monitor
  • Seek out some REAL training on the tools I use, or different ones. First I have to find the right place to ask those questions, though.
  • See if performance testing under the test region will tell us anything useful.

Ten books that have stayed with me

Apparently this meme is making the rounds again. I’m happy to bite.

  • Susan Cain, “Quiet: The Power of Introverts”
  • Charlotte Bronte, “Jane Eyre”/”Villette” (I’m cheating a little)
  • James Baldwin, “Go Tell It On The Mountain”
  • Isaac Bashevis Singer, “Enemies: A Love Story”
  • Cem Kaner, James Marcus Bach, and Bret Pettichord, “Lessons Learned in Software Testing”
  • Roxane Gay, “An Untamed State”
  • Gail Tsukiyama, “The Samurai’s Garden”
  • Mikhail Bulgakov, “The Master and Margarita”
  • Lois P. Frankel, “Nice Girls Don’t Get The Corner Office”
  • Katherine Dunn, “Geek Love”

A fine balance: introversion, leadership, and being a good team member

I recently finished Susan Cain’s excellent book Quiet: The Power of Introverts. For me, an introvert’s introvert, the book confirmed some things I already knew and gave me some new things to think about. It’s dispiriting that American business culture still gives so much credence to the fast talker: in fact, according to studies Cain writes about, big talkers are commonly perceived as being smarter than those who have less to say. (This appears to be a cultural limitation: Cain writes about how some Asian cultures still prize the quiet, studious, family-oriented person.)

I often sit back and observe software team meetings: on my current team, the lead developers appear to be more introverted than the business SMES and analysts, and unless they are given an opportunity to speak they typically won’t do so. I am the lead tester on the team, and a technically-inclined woman as well, and I generally act contrary to my introverted nature in meetings. I’ll speak up unsolicited and offer my opinion if I think it’s warranted. I’ve learned the hard way that unless you speak up in such a meeting, you’re likely not to have an opportunity to be heard. 

It’s even more interesting when you often have something to say because you see something that is off or wrong, or that could simply be improved. I found out recently that the itch for change is part of my nature too. My company recently offered an “influencing skills” workship in preparation for which we each took a DiSC personality assessment. I tend to be more than a little skeptical of personality assessment tests — confirmation bias ahoy. But it was fascinating to watch as we broke down into our DiSC groups and each group tended to behave in accordance with DiSC expectations: the S’s took twice as long to arrive at a decision as anyone else, while the D’s (that was my group) had no trouble making quick decisions.  According to the DiSC approach, D’s tend to want to change things and do so quickly. (As I said to one of my fellow D’s, “D is for diva.”) I identified right down the line with the approach’s description of the D (again, note to myself, confirmation bias ahoy). So I appear to be a Dominant and also an introvert who is all too aware of the cultural bias against introverts. Try sitting still with that at a meeting.

Cain writes about Free Trait Theory as a possible explanation of seeming contradictions in our natures. Simply put, we contain multitudes as did Walt Whitman. You have introverts who are splendid and beloved lecturers, organizational leaders, and actors. But they can’t play those roles for too long: after a few hours in the public eye, you’ll see them running off to their private office whose door they close, and they stay there for quite a while, perhaps with a pet and a radio as their only company. 

What does all of this mean for being a technical test lead for a project? It means that I have to reconcile my innate tendency NOT to speak up against my contrary tendency to want to announce Things That Are Broken That Need Fixing. For reasons I won’t go into here, I believe I went a little too far down the Jeremiah lane this week; however, I did act ethically on the best information available to me at the time. And my actions led to a group initiative to do further exploration of the issue: getting more information on a subject can never hurt. But there’s a time and place for everything. It’s trial and error for anyone who is in a leadership role to know when to sit back, shut up, and listen, and when to speak up. (Mentoring helps. A lot.)

Web services performance testing: a pilgrim’s progress. Part 1.

When I embarked on a high-level test strategy for my latest project, I knew that I wanted to learn how to conduct meaningful performance tests. I had never done any kind of performance testing before, but I knew it was time (probably past time) to cross the Rubicon.

I am lead on the project and my own testing effort involves Web services, so I had to figure out:

  • Which aspects of performance I wanted to look at – meaning which questions I wanted to ask about performance
  • How to use the tools at my disposal – or learn how to use FOSS tools – to answer those questions
  • How to report results so that my project team could use the information I found

Now, if I wanted to learn how to write a Python program, I would have numerous online and print resources at my disposal. If I wanted to learn how to test Web service performance, I would have to look elsewhere. I became aware that I didn’t even know the correct questions to ask. 

I knew that some of my company’s online applications had had performance issues in the past, so I consulted with the people who had looked at those issues in depth.

I also looked at the testing tools that we had in-house that could measure performance: their documentation yielded some information on possible questions I could ask.

It seemed that one critical (and obvious) question would be:  what is the response time for a request? Even that question poses several new ones, though. Among those subquestions are:

  • How many requests should I submit to get a decent sample size?
  • Should I space those requests out evenly over time? Or should I vary their frequency?
  • How do I make sure I get a realistic sample of the request data that could be submitted to the service?
  • Does response time vary predictably with any other parameter? How about the size in bytes of the response? What about other load on the system that the Web services under test share with other applications?
  • Should I use our production region to do performance testing? Or can I get away with using the test region? (It turned out that for various reasons our test region would not give us a realistic idea of performance. My tests, run in parallel in both regions, established this beyond a shadow of a doubt. So I had to bargain with our prod support folks to continue to run my tests in the prod region.)
  • Should I look at average or median response time? (I had to refresh my decades-old introductory statistics course knowledge with online resources to answer that question.)

And I also had to look at the performance requirement to which my project team had stipulated. I learned early on from an old hand  that without a specific performance requirement, your performance data will not be terribly useful to the team. Note that your information might well establish a realistic performance requirement where there is none.

More detail to come.