“Our API Gateway is falling over and we expect a 6-fold increase in our client base and a 10-fold increase in requests, our backend service is scaling and performing well. Please help us understand and fix the API Gateway”
It was pretty clear we had to run a series of performance tests simulating the current and new user load, then apply fixes to the product (API Gateway), rinse and repeat until we have met the client’s objectives.
So the key tasks were:
- Do some Baseline tests to compare the API Gateway Proxy to the Backend API
- Use tools to monitor the API Gateway under load
- Tune the API Gateway to Scale under Increasing Load
- Repeat until finely tuned
My experience with API performance testing was limited (Java Clients) so I reached out to my colleagues and got a leg up from our co-founder (smart techie) who created a simple “Maven-Gatling-Scala Project” in Eclipse to enable me to run these tests.
Gatling + Eclipse + Maven
- What is Gatling? Read more here: http://gatling.io/docs/2.1.7/quickstart.html
- Creating an Eclipse Project Structure
- Create a Maven Project
- Create a Scala Test under /src/test/scala
- Create galting.conf , recorder.conf and a logback.xml under /src/test/resources
- Begin writing your Performance Test Scenario
- How do we create Performance Test Scenarios?
- Engine.scala – Must be a Scala App extension
- Recorder.scala – Must be a Scala App
- Test – Must extend Gatling Simulation, we created an AbstractClass and extended it in our test cases so we can reuse some common variables
- gatling.conf – This file contains important configuration which determine where the test results go, how to make the http calls etc See more details here http://gatling.io/docs/2.0.0-RC2/general/configuration.html
Here is a screenshot of my workspace
Executing the Gatling Scala Test
- Use Maven Gatling Plugin See https://github.com/gatling/gatling-maven-plugin-demo
- Make sure your pom file has the gatling-maven-plugin artifact as shown in the screen shot below
- Use the option “-DSimulationClass” to specify the Test you want to run (For Example: -DSimulationClass=MyAPIOneTest)
- Use the option “-Dgatling.core.outputDirectoryBaseName” to output your reports to this folder
Creating A Performance Test Scenario
Our starting point to do any sort of technical analysis of the API gateway was to study what it did over a varying load and build a baseline. Also this had to be done by invoking a few of APIs during the load to simulate varying requests per second (For example: One api is invoked every 5 seconds while another is done every 10 seconds).
After reading the documentation for Gatling and trying out a couple of simple tests, we were able to replicate our requirements of ‘n’ users by doing 2 scenarios executing http invocations to two different API calls with different ‘pause’ times calculated from the ‘once every x second’ requirement. Each scenario took ‘n/2’ users to simulate the ‘agents’ / ‘connections’ making different calls and each scenario was run in parallel.
- We used a simple spreadsheet to put down our test ranges, write down what we will vary, what we will keep constant and our expectations for the total number of Requests during the Test duration and expectations for the Requests Per Second we expect Gatling to achieve per test
- We used the first few rows on the spreadsheet to do a dry run and then matched the Gatling Report results to our expectations in the spreadsheet and confirmed we were on the right track – that is, as we increased the user base our number for RPS and how the APIs were invoked matched our expectation
- We then coded the “mvn gatling:execute” calls into a run script which would take in User count, Test Duration and Hold times as arguments and also coded a Test Suite Run Script to run the first script using values from the Spreadsheet Rows
- We assimilated the Gatling Reports into a Reports folder and served it via a simple HTTP Server
- We did extensive analysis of the Baseline Results, monitored our target product using tools (VisualVM, Oracle JRockit), did various tuning (that is another blog post) and re-ran the tests until we were able to see the product scale better under increasing load
Performance Test Setup
The setup has 3 AWS instances created to replicate the Backend API and the API Gateway and host the Performance tester (Gatling code). We also use Nodejs to server the Gatling reports though a static folder mapping, that is during each run the Gatling reports are generated onto a ‘reports’ folder mapped under the ‘public’ folder of a simple http server written in Node/Express framework.
API Performance Testing Toolkit
The Scala code for the Gatling tests is as follows – notice “testScenario” and “testScenario2” pause for 3.40 seconds and 6.80 seconds and the Gatling Setup has these two scenarios in parallel. See http://gatling.io/docs/2.0.0/general/simulation_setup.html
- 2 user tests reveal we coded for the scenario right and calls are made to the different APIs during differing time periods as shown below
- As we build our scenarios, we watch the users ramp up as expected and performance of the backend and gateway degrade as expected with more and more users
- Baseline testing
- Before tuning
- Finally we use this information to Tune the components (change Garbage collector, add a load balancer etc) and retest until we meet and exceed the desired results
- After tuning