A few times this year I have been fortunate enough to stumble across screen scrapers / bots continuously making calls to the same URL over and over, they have always presented issues to the customer. I found the cause right away, but it also lead me to writing this post as it’s well worth sharing, because I asked myself "why didn't the customer find this?".
When you first open up AppInternals, you are handed all the information on a plate...thing is, if you do not consider all the possibilities (http://www.riverbednews.com/legacy/2014/04/hidden-in-plain-site-part-1/)- then it’s really easy to miss something unless you experience it at least once or you never suspect dodgy goings on within your website. So, I ran through the workflow with the users...kinda went like this:
Transaction Types - ALL
Audience Role or Type - ALL
Country - ALL
Frontend Servers - ALL
Backend Server - None
Instance - All Front End, No Backend
Database - None
WebService - None
Normally, the ALL's above would be different and the workflow 99% of the time works...when ALL is an instance or server, transaction type or database - you hit the nail on the head and find the pot of gold as you have some evidence to suggest a guilty component. This is the advantage of capturing all transactions all of the time right!
In this case...the workflow did not go far enough...if we add one more question in the "Audience" section...browser or non-browser activity - this would have highlighted something. The customer at first was like "non-browser??", I said yeah..."back in 90's we all used a command line, or we scripted load tests this way"...customer..."wtf, why....are you ***** *****g me?".
We ran a command in AppInternals to show a simple "transactioncount -group_by user.ip" - it showed the most active of IP's that at first looked normal. Cherry picking the top few IP's, I said "let’s do something with these IP's".
Me "Let's see what URL's the users (we'll call them that for now) were hitting...is nearly 6000 requests per hour around about right??"
Customer "this might be a group of users in the same office"
Me "Cool, this customer must be pretty popular with you then, thing is, these are the same times you reported issues, do they just hammer the same URL over and over?"
The next query we ran was the same IP by username, then we ran the same by "browser"
Both queries showed ZERO Results. Why would your users not be using browsers?
In the next 30 minutes or so, we trawled through all the data and transactions in AppInternals...we didn't capture a single user or browser transaction on that IP. The customer checked the Apache logs, this suggested exactly the same.
Stopping these BOTS/Non-Browsers can be tackled in a number of ways...an ACL on the Apache, IP throttling on the Web Server, Load Balancer or Firewall, using a cloud based BOT Management Tool.