Web services performance testing: a pilgrim’s progress. Part 4.

I have SoapUI Pro from SmartBear at my disposal at work, and I’m quite comfortable with it. So it was my tool of choice for creating load and recording details about the results.

SoapUI’s first cousin LoadUI, a free version of which comes with the SoapUI Pro install, was also an option. However, I chose not to explore it for this testing mission. (A road not travelled, admittedly.)

My first load tests sent requests to the middleware Web services described in Part 3 of this series. Because of our NTLM proxy at work I had to use Fiddler to do the handshake between my desktop and the Web services.

Fiddler records a lot of interesting information about Web service requests, including response times and bytesize. So I copied that data from Fiddler, pasted it into Excel, and created time-series graphs from that data. I was able to create some pretty graphs, but copying and pasting the data over and over got to be a real time sink. I am not and will never be a VBA whiz, so I knew I had to find a better way.

I was forced into finding that better way when it came time to test the performance of the .NET services that USED the middleware Web services. Because of the .NET Web server’s configuration, I could no longer route requests through Fiddler and see the data. What seemed to be a hindrance turned out to be a blessing in disguise.

The solution I arrived at was to use SoapUI to record several aspects of the request and response transactions to a flat file. I could then bring that flat file into R for graphing and analysis.

The SoapUI test case for the .NET services is set up as follows. Apologies for the blurriness of some images below: I haven’t done HTML-based documentation in quite some time.

  1. Initial request to get the service “warmed up.” I do not count those results.
  2. Multipurpose Groovy script step.
  3. // I am registering the jTDS JDBC driver for use later in the test case. See below for info on using third-party libraries.

    import groovy.sql.Sql
    com.eviware.soapui.support.GroovyUtils.registerJdbcDriver( "net.sourceforge.jtds.jdbc.Driver" )

    //Get the time now via Joda to put a time stamp on the data sink filename.

    import org.joda.time.DateTime
    import org.joda.time.LocalDate

    def activeEnvironment = context.expand( '${#Project#activeEnvironment}' )

    def now = new DateTime().toString().replaceAll("[\\W]", "_")

    // Construct the file name for response data and set the filename value of a Data Sink set further along in the test.

    testRunner.testCase.getTestStepByName("Response Size and Time Log").getDataSink().setFileName('''Q:/basedirectory/''' + activeEnvironment + '''_''' + now + '''_responseTimes.csv''')

    If your Groovy script uses third-party libraries like jTDS and Joda, you have to put the jar files into $soapui_home/bin/etc.

    Putting jars where SoapUI can see them.

    Note that jTDS has an accompanying DLL for SQL Server Windows-based authentication. DLLs like this go in $soapui_home/bin.

    Putting DLLS where SoapUI can see them.

    This is how you set an activeEnvironment variable: setup happens at the project level:

    Creating environments at the project level.

    Then you choose your environment at the test suite level.

    Choosing environment in the test suite.

  4. Run a SQL query in a JDBC DataSource step to get a random policy number from the database of your choice on the fly. You can create your SQL query for the DataSource step dynamically in a Groovy script.
  5. // State Codes is a grid Data Source step whose contents aren't shown here.

    def stateCode = context.expand( '${State Codes#StateCode}' )

    testRunner.testCase.getTestStepByName("Get A Policy Number").getDataSource().setQuery("SELECT top 1 number as PolicyNumber FROM tablename p where date between '1/1/2012' and getdate() and state = '" + stateCode + "' order by newid()")

  6. Here’s the JDBC data source for the policy numbers.The query (not shown) is fed over from the preceding Groovy script step.
  7. JDBC DataSource step.

    You will have to set your connection properties. Your connection string for jTDS might look something like this. For more information about jTDS, see the online docs.

    jdbc:jtds:sqlserver://server:port/initialDatabase;domain=domainName;
    trusted_connection=yes

  8. I feed the value of PolicyNumber returned by the SQL query to my SOAP request via a property transfer.
  9. I have a few assertions in the SOAP request test step. The first two are “canned” assertions that require no scripting.
  10. Assertions in SOAP request.
    The third SOAP request assertion, which is more of a functional script than it is an assertion, captures the timestamp on the response as well as the time taken to respond. These are built-in SoapUI API calls. The properties TimeStamp and ResponseTime are created and initialized in this Groovy script – I didn’t have to create them outside the script (for example, at the test step level).

    import org.joda.time.DateTime

    targetStep = messageExchange.modelItem.testStep.testCase.getTestStepByName('Response Size and Time Log')
    targetStep.setPropertyValue( 'TimeStamp', new DateTime(messageExchange.timestamp).toString())
    targetStep.setPropertyValue( 'ResponseTime', messageExchange.timeTaken.toString())

  11. Another Groovy script step to get the size in bytes of the response:
  12. def responseSize = context.expand( '${Service#Response#declare namespace s=\'http://schemas.xmlsoap.org/soap/envelope/\'; //s:Envelope[1]}' ).toString()

    responseSize.length()

  13. Yet another Groovy script step to save responses as XML in a data sink, with policy number, state code, and active environment as the filename:
  14. def policyNumber = context.expand( '${Service#Request#declare namespace urn=\'urn:dev.null.com\'; //urn:Service[1]/urn:request[1]/urn:PolicyNumber[1]}' )

    def stateAbbrev = context.expand( '${Service#Response#declare namespace ns1=\'urn:dev.null.com\'; //ns1:Service[1]/ns1:Result[1]/ns1:State[1]/ns1:Code[1]}' )

    def activeEnvironment = context.expand( '${#Project#activeEnvironment}' )

    testRunner.testCase.getTestStepByName("DataSink").getDataSink().
    setFileName('''B/basedirectory/''' + policyNumber + '''_''' + stateAbbrev
    + '''_''' + activeEnvironment + '''.xml''')

  15. Some property transfers, including a count of policy terms (see part 3 of this series) in the response:
  16. Property transfers.

  17. DataSink to write the response XML to a file. The filename is set in step 7 above.
  18. File DataSink step.

  19. DataSink to write information relevant to performance to a CSV whose name is set on the fly. The filenames and property values come from the steps above.
  20. Response properties DataSink step.

  21. Don’t forget a delay step if you’re concerned about overloading the system. (Note that you can configure delays and threading more precisely in LoadUI. This test case, like all SoapUI test cases that aren’t also LoadTests, is single-threaded by default.)
  22. Delay step.

  23. And of course a DataSource Loop.
  24. DataSource Loop step.

  25. The whole test case looks like this:
  26. Whole test case.

  27. And now I have a CSV with response times, bytesizes, policy risk states, and term counts to parse in the tool of my choice. I chose R. More about that later.

Big data technology for absolute beginners: a meetup report

After years of reading-intensive formal education, I’ve come to the conclusion that I’m actually best at hands-on learning. You can talk to me till next year about technical concepts but until I can see them in action, they often don’t make much sense to me. That’s why this morning’s Boston Data Mining meetup was so valuable: we worked directly with an Amazon Web Services Elastic MapReduce cluster and an AWS Redshift database. Once you can make those connections, the learning opportunities are probably boundless.

The good people of Data Kitchen (Gil Benghiat, Eric Estabrooks, and Chris Bergh) first gave us high-level overviews of Hadoop, AWS EMR (Amazon’s branding of Hadoop), Redshift, and associated frameworks like Impala. Also covered at a high level were MapReduceHive and Pig, which you can use to retrieve data from a Hadoop/AWS EMR cluster. Each technology has its strengths and weaknesses, and the DK guys gave some expert advice in those areas too. Questions from the 50+ people in the room were of high quality and brought up some good discussion points. Also on hand with deep subject matter expertise and critical helpful hints for newbs like me was William Lee of Imagios.

Before long it was time to try connecting to an AWS EMR cluster. Setup for these connections is not a trivial matter, but fortunately there were good instructions posted on the DK blog before the meetup convened. Amazingly, even though many people arrived at the meetup without having completed the setup prerequisites, and all three major desktop O/S were well represented in the crowd, most people were able to run a SQL query against an AWS EMR cluster by the end of the morning. Yes, even me. (My biggest challenge was trying to get SQL Workbench to run on Debian without gnome or KDE installed. FOSS and I have a Stockholm syndrome type of relationship. Long story short, make sure you have one of those desktop environments installed before you try to use SQL Workbench on Linux.)

The Data Kitchen guys and William Lee also put in a few extra hours to make sure we all could put together a Redshift database to which we could connect from our desktops. I was flabbergasted that I was able to get up and running with the AWS technologies for a couple of cents on the dollar. Last week I enrolled in an online Hadoop course that promised I could run labs in the cloud, only to find out after the first couple of lectures that there was no cloud and that the desktop software I would need required a six-core processor at a minimum. Needless to say, I quickly unenrolled from the course.

You can create an AWS EMR cluster that costs a few cents an hour to run. The configuration options you choose during cluster creation apparently can affect the price greatly, so be careful. (The Data Kitchen slides provided specific information on this point.) Also critical: if you’re not going to keep using the EMR cluster or Redshift database you create, remember to terminate it (EMR) or shut it down (Redshift), or face a big credit card bill later. Another great thing for us cheapskates: public big data is only a Google away.

Slides from the workshop will be posted to Slideshare – I would imagine that Data Kitchen will announce the postings on their blog. All told, this was a morning well spent. My lunchtime visit to Tatte Bakery on 3rd Street didn’t hurt.