AppInternals and Oracle Enterprise Manager on the same JVM/Domain

I have had the luxury of working in a customer environment where we instrument two APM tools side by side. Riverbed's Appinternals and Oracle's Enterprise Manager. Both are powerful, both are excellent in fact, but you if you need to allow them to do what they are good at!

Application owners, support staff or outsourced functions tend to use AppInternals, Middleware Administrators tend to have sole use of OEM or the Weblogic Console to monitor status of managed nodes etc...Running/Stopped/Paused and so on. 9 times out of 10, Application owners are telling middleware teams there is a problem. The reason why Application owners or support staff do not have access to Weblogic Console/Enterprise Manager Console is...., well, ask your local middleware guy...its an admin tool, so they don't want anyone poking round right!


Knowing when an application or service is slower than the rest:

Riverbed captures all inbound servlet transactions (webservices), JMS message collection, remote connection or database query. Its very easy to tell which server, instance or webservice is different to the rest in real time. Metrics give Riverbed real time average/max response times per instance or transaction. OEM can do this too, but the agent needs to be profiling to do so. Without Appinternals, or OEM you'd need to logon to the Enterprise Manager console to work this out typically, but when an environment has 10x managed nodes and 100 webservices...knowing where to start looking delays finding what the application team have been saying since the start...."this service, this server, this exception, started at this time..."


JMX type metrics, like pools or queues:

Appinternals 10 does not support JMX connections to monitor these components. OEM gives you 1 min counters to monitor pool size, hogging thread counts etc. This is a clear advantage of Oracle Enterprise Manager, it's JMX heavy instrumentation can go very very deep into the configuration layers of Weblogic where some products or vendors can only dream about. Its pretty much the weblogic console embedded into the tool.


Business Service or Proxy Services

A big part of the Service Bus is orchestration, taking a request and doing something with it, owning a task and taking it through to completion. Only OEM can track the logic or perform diagnosis of this part of the JVM..its very proprietary, powerful and not something AppInternals can report on. Use Appinternals for detection, for the precise identification...allow OEM to assess the configuration and what can be improved/changed etc. (like number of threads per work manager for example)


Heavyweight or Lightweight:

Appinternals = very light, especially for collecting metrics and transactions 100% of the time. Overhead is around .5% CPU.

OEM - very light, unless profiling is enabled...if profiling, CPU and Memory can quickly get eaten up and the response time of the JVM suffers. This screenshot was taken recently when profiling was enabled at 3 hour intervals. The profiling instructs the domain to collect everything it can from each managed node....this is what is can do:


Reactive or Proactive
AppInternals - Proactive, early detection etc

OEM - Reactive, but has some excellent forensics tools to explore the damage after an event.


Weight Distribution/Managing your FARM

AppInternals - this truly is where nothing else can compare, knowing which server or instance has more requests of a certain type than another, where those requests came from and where they went, which had an exception and getting a per transaction map....yeah!

OEM - uses static metrics in the UI to collect view or insight can give you that total view of every instance/server/client/remoteaddr or url called.


Root Cause Analysis

AppInternals - It can certainly tell you which service is disrupting performance, it provides exception information and ORA-000x type
codes if there is a database problem to identify the RCA. It can measure the JVM like any other tool on the market, threads, heap, jms, jndi  bla bla bla...simply because its running continuously and never misses a single transaction.

OEM - if profiling, and killing the JVM it can do this. But, its better used when an incident has been resolved or a JVM was restarted. It has excellent tooling around forensics once a capture of the JVM has been taken (thread dump like in Oracle Land).


This image was taken when one of the administrators of OEM decided to enable collection of metrics/transactions every 3 hours via a clever WLST hudson script, as the JVM runs for more time, the collection gets bigger and bigger....the response times get slower and slower....end of story!

We captured the initial test of the script in AppInternals around 4pm the day before the business was impacted. It was enabled around midday the following day without any notification to consumers. The hudson job impacted 100+ webservices on all managed nodes at the same time, the database was not to blame, nor was a remote system. It means something within the domain was causing such widespread delay.



Ease of Use

AppInternals - Single UI, also has the ability to extend into SteelCentral Portal, easily create bespoke insights. Anyone can use it.

OEM - Multiple UI's, potentially up to 5 (OEM, REUI, BTM, JVMD, DBD, Weblogic EM Console), difficult to navigate, dated, too many clicks to find a node or service....clicks=time in APM, not something for the mass audience, administrators tend to guard it anyway. OEM does allow some deeper five into the DB layers, this something that AppInternals certainly cannot do, but now we tread from Middleware into DBA territory...this works nicely only if you have DBA permissions.


Happy Ever After

It is for this customer, they have both worked to their strengths for a number of years. Profiling has never been switched on in production environment for OEM since we advised (and 50 application consumers) them to switch off the hudson job. OEM is a tool that makes administering or making improvements possible, Appinternals is not. AppInternals captures the performance of the nodes and allows quick correlation, correlation and proactive alerting before a failure really impacts a business....OEM is good after a problem has been detected, but does not need to profile a whole domain...single server, out of hours usually is enough.


TIP: Change control also is a big help, knowing which service has been updated or changed often gives the initial clues, AppInternals confirms better/worse conditions, OEM offers the tool to tune and fix(proxy. business services and so on)