Browsers are not the only audience!

A few times this year I have been fortunate enough to stumble across screen scrapers / bots continuously making calls to the same URL over and over, they have always presented issues to the customer. I found the cause right away, but it also lead me to writing this post as it’s well worth sharing, because I asked myself  "why didn't the customer find this?".

 

When you first open up AppInternals, you are handed all the information on a plate...thing is, if you do not consider all the possibilities (http://www.riverbednews.com/legacy/2014/04/hidden-in-plain-site-part-1/)- then it’s really easy to miss something unless you experience it at least once or you never suspect dodgy goings on within your website. So, I ran through the workflow with the users...kinda went like this:

 

Transaction Types - ALL

Audience Role or Type - ALL

Country - ALL

Frontend Servers - ALL

Backend Server - None

Instance - All Front End, No Backend

Database - None

WebService - None

 

Normally, the ALL's above would be different and the workflow 99% of the time works...when ALL is an instance or server, transaction type or database - you hit the nail on the head and find the pot of gold as you have some evidence to suggest a guilty component. This is the advantage of capturing all transactions all of the time right!

 

In this case...the workflow did not go far enough...if we add one more question in the "Audience" section...browser or non-browser activity - this would have highlighted something. The customer at first was like "non-browser??", I said yeah..."back in 90's we all used a command line, or we scripted load tests this way"...customer..."wtf, why....are you ***** *****g me?".

 

We ran a command in AppInternals to show a simple "transactioncount -group_by user.ip" - it showed the most active of IP's that at first looked normal. Cherry picking the top few IP's, I said "let’s do something with these IP's".

 

Me "Let's see what URL's the users (we'll call them that for now) were hitting...is nearly 6000 requests per hour around about right??"

Customer "this might be a group of users in the same office"

Me "Cool, this customer must be pretty popular with you then, thing is, these are the same times you reported issues, do they just hammer the same URL over and over?"

BOT1.png

 

 

 

The next query we ran was the same IP by username, then we ran the same by "browser"

BOT2.png

 

 

Both queries showed ZERO Results. Why would your users not be using browsers?


In the next 30 minutes or so, we trawled through all the data and transactions in AppInternals...we didn't capture a single user or browser transaction on that IP. The customer checked the Apache logs, this suggested exactly the same.

 

Stopping these BOTS/Non-Browsers can be tackled in a number of ways...an ACL on the Apache, IP throttling on the Web Server, Load Balancer or Firewall, using a cloud based BOT Management Tool.

 

Good luck!