Three Degrees of Separation in APM
I am often asked two questions: “What makes JXInsight so much more faster and scalable than other solutions?” and “How can the high quality of JXInsight be maintained with such high frequency (bi-weekly) of software updates?”.
One factor common in answering both questions is in the clean separation of instrumentation, measurement and data (model) collection within our highly extensible & configurable activity based resource metering runtime – a separation that is not present in other tools, certainly not at the same degree.
For the most part what gets instrumented by other vendor solutions is what gets measured and collected. So any new features added by vendors outside of possible optimization of these concerns usually entails additional overhead as well a possible impact to reliability – release after release. We have observed this phenomenon across a number of products which we track internally even when there is an option to disable a feature there is still some residue of overhead which is to be expected if there is tight coupling between the implementation of each concern.
In the design of our probes technology we avoided many limitations of legacy call tx/trace/path solutions by not coupling our runtime to any particular instrumentation technology by way of an Open API that whilst affording an extremely efficient interface for the purpose of instrumentation allowed for the deferral of both measurement and typed data collection based on dynamic aspects of the metered execution as well as the external configuration of optional features by a system administrator.
Below is diagram showing how our metering runtime stack looks straight out-of-the-box. The blue layers represent what is exposed directly to our customers. It is important to note that the hundreds of technology specific instrumentation extension libraries we provide, which can be simply dropped into class path, all use the same underlying Open API made available to customers for the purpose of custom instrumentation, reporting, resource metering and billing. The brown layer, our strategy probes provider, dynamically determines whether particular instrumented method calls are measured (metered) and hence whether any data is collected. By default the runtime only collects metering information aggregated by cost groups derived from composite probe names. Any additional data collection requirements such as stacks, tracks (traces), transactions, charges, and billings needs to be explicitly enabled by installing the appropriate probes provider via a system property. This gives our customer complete control over our resource metering overhead.
Note: There is no mechanism in the Open API to programmatically enable capabilities as we are targeting production environments which are generally managed via configuration under change control. It is possible to query the runtime for installation of particular data model augmentation providers.
The strategy probes provider is one of a number of providers used to break the link between what is instrumented and what is measured. Based on actual live metering information the strategy probes provider can dynamically disable future measurement of instrumented calls for a particular named probe which generally maps to a Java class method though we support this same capability in other VM languages (JRuby, Jython). The strategy provider is enabled by default but it is still optional and can be disabled as depicted below.
All of our AOP extensions go through our Open API so it is relatively easy to integrate with other instrumentation technologies. You can also use our Open API directly within your code if you need to create contextual named probes which do not map easily to method calls.
Note: Generally customers that use the Open API also use the AOP SDK though such usage is not mandatory.
The real differentiator comes into play when you consider the number of providers available today some of which offer their own extension points via a SPI. We have measurement filtering providers that can dynamically elect whether to meter a particular instrumentation event/call based on aspects of the execution such as the degree of concurrency. Below these we offer various data model augmentation providers that create and maintain additional software execution model structures as well as performing statistical analysis on the model. What is impressive is that we have achieved all this in a very coherent manner in terms of execution behavior and data models whilst maintaining the integrity of the runtime design that we truly believe has unlimited potential and application (even outside of the application performance management domain).
This is not the end as we continue to design and develop new ways of measuring, collecting and modeling execution behavior below the surface of the Open API which will benefit all extensions and custom integrations layered above. We have many more updates coming each introducing new providers which can be injected into the runtime completely transparently.
Does transaction tracing scale @ analysis?
In my previous blog entry, Don’t trace calls. Record metering changes, I described an alternative scalable approach to transaction tracing based on tracking changes to a resource metering model rather than recording every single trace/log event and worse transmitting this over the wire to some central management server even before the transaction warrants recording.
Clearly the overhead of transaction tracing is huge (relative to the type of high performance applications that our product is designed to monitor). But what if your choice of architecture, language, runtime, technologies and resource access patterns mean that such excessive overhead would still pale in comparison to the normal execution times of your applications request processing. Can tracing still scale? I think not and to demonstrate this I have taken two screenshots of two trace based solutions which even with the simplest of applications to monitor and with the minimum of trace instrumentation (entry and exit points) it is painfully obvious that things do not scale at analysis time when such collection structures are displayed as such.
Each screenshot below captures a partial display of a single trace (or transaction). I would need a much bigger monitor (wall sized?) to display both in their entirety.
Even if I could see the actual SQL statements (you need to expand each table row) my short term memory banks would be stretched to maintain the execution context. Now imagine if both these solutions actually did instrument much more than the bare bone entry and exit points in a real-world enterprise application – both memory systems would be severely taxed.
Admittedly some sort of visualizations could be added on top. But that seems to defeat the whole point of tracing does it not? Why do it the first place if it does not scale at runtime or analysis time? Less is more when done right.
Below is the second solution which paints an even worse picture even though the application is much more primitive.
JDBInsight (circa 2000) the fore runner to JXInsight actually had an innovative way to address this issue (an issue that has existed in every new crop of transaction tracing solution that has come on the market since then) using local and global loop detection and sequencing – we called it path compression.
Why are people building and using the same technology that did not scale then and definitely not now? Do you think the above is usable outside of the development context? Have you ever measured the overhead of your own monitoring tools or custom tracing & logging toolkits? Have you done a thorough cost benefit analysis? Thoughts?
Don’t trace calls. Record metering changes.
Professional software performance engineers managing applications with extreme performance & scalability requirements recognize that application performance management solutions based largely on call traces and call paths (which are somewhat confusingly referred to as transactions) don’t scale in a production unless the degree of instrumentation is severely limited to process entry (http, rpc) and process exit points(http, messaging, database,…) or in the worse case the service time for exit points are horrendous making the excessive overhead (footprint, bandwidth, cpu time,…) incurred by such solutions relatively small in comparison.
For the most part the vendors themselves recognize this by offering technology specific trace extension packs which limit instrumentation to typical expensive execution points in an application. But what happens if even with such debilitating options the cost of tracing is still not acceptable especially considering the increased risk for performance blind spots and/or a prolonged time to problem resolution. You really do want to know the contributing factors when you have particular requests which go off the radar and frustrate users & customers. But if this is only occasional its hard to justify (unless you are a developer) the excessive trace collection costs incurred for every request by such solutions especially when they have no built in self-optimizing capabilities.
If one thinks hard about it all that one really needs to know is which instrumented methods metered (measured) contributed the most to the problem occurrence – a problem which could be specified in terms of the high consumption rate of one or more resource meters (i.e. clock.time, cpu.time, …). Rarely is the caller-to-callee chain required for such problem resolution worse still it tends to get in the way of actually determining the biggest contributors (hotspots) – not only does it not scale at runtime it does not scale at analysis time unless you use powerful visualizations to reduce the number of mouse clicks in expanding tree nodes. If you have tested your application going into production you already have this call dependency data. Why would you want to repeat the same exercise?
Because JXInsight maintains resource metering models at thread context level (as well as at the global) in memory it is pretty easy for our metering runtime to efficiently capture the data needed for outliners which breach some service level agreement. We can defer much of the collection work required to the point following the problem occurrence using SavePoints and lazily creating ChangeSets based on one or more rules which specify a particular meter, threshold and the set of fully or partially named transaction point probes applicable.
If you look at the Changes tree table view below for the selected transaction you will see that 6,261 instrumented methods were metered during the period of the transaction which took 1.2 seconds. That kind of event collection work for a trace tool that stores not only the caller-to-callee graph but an event journal representing the sequence of such calls would be disastrous – your performance management solution would be part of the performance problem. Whereas our mini metering model would only need process & store the metering changes (and only after the outliner has been detected) which in the vast majority of cases is a significantly much smaller data set because of the repetitive method execution nature of most request processing and our not creating & updating of call tree paths (branches).
With our approach normal requests run at near full speed (compared to expensive call trace interceptions) with a very small footprint which only increases marginally at the point the request completes and is matched against a transaction rule (i.e. slow request, busy request,…) classifying it as problematic.
Metering (Profiling) Apache Hadoop Jobs
Late last night I was asked to show how JXInsight could be used to meter Apache Hadoop MapReduce jobs executed in a distributed environment. A challenge for most standard monitoring tools because during the execution of such work multiple JVM’s can be started up and shutdown repeatedly.
This dynamic process behavior is what makes this technology scale but at the same time kills the scalability of management tools especially those that use a central management server for storing monitoring data and assume the management model is rather static.
Fortunately JXInsight can work in both off-line and on-line mode in merging multiple resource metering models (databases) across multiple processes (dead or alive). Here are the steps I performed based on the limited information I had time to read.
Change the HADOOP_CLASSPATH and HADOOP_OPTS variables in the hadoop-env.sh script located in the conf directory.
Change the mapred.child.java.opts property in the hadoop-site.xml file located in the conf directory.
After running the bin/start-all.sh script I executed the following command:
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
Below is a listing of the contents of the jxinsight.home directory following the completion of the command. Each snapshot represents a metered process that has terminated. The number following the "local." prefix is the agent port number which starts at 1515 and increments if multiple agents are running concurrently. Port numbers are reused when a process ends and another one starts.
Note: The ports from 1515-1519 were already taken by the Hadoop processes managing the distributed environment.
Here is the metering model for the single snapshot with port 1520 which is the command process itself.
Note: I did not apply any filtering of the instrumentation which would normally be done following an initial investigation of the software execution.
Below is a metering model from one of the child job/task process snapshots. No org.apache.hadoop.examples probe group?
Here is the source code of the Grep class I used to determine what was being executed within the child processes.
Filtering on the metering model of the child process I can now see the reducer within the MapReduce job.
To perform a cluster-wide analysis of the metering I simply merged the off-line snapshots within the console.
Here is the combined resource metering model for the command and all child job/task processes.
Within the merged snapshot we can still breakdown the cluster context by host and process (even by thread if you are adventurous and have enabled this option).
Now we can see how many reducers were kicked off – 23.
This little escapade has definitely spiked my interest in this technology. I have already sketched out some extensions that we could build to integrate much tigher making all of the above seamless. Stay tuned.
Custom Extensions: Terrastore Part 2
In the first part of this series of entries showing how easily JXInsight can be extended I created a set of extensions that performed metering at particular execution points. In that entry only one meter, clock.time, was included in our metering model. In this entry I am going to show how to create new meters without writing any code by using the metering model as the resource datasource – meters based on the metering itself.
Below is a metering model I have captured following the execution of 5 HTTP requests. One PUT to create a Bucket. One PUT to add a Value to a Bucket. One GET to return a Bucket. One GET to retrieve a Value present within a Blucket. One DELETE to remove a Bucket.
From the above we cannot be easily determine the relation between the HTTP methods and the corresponding Terrastore Server operations though we have made it somewhat easy for ourselves with the frequency (Count) listed alongside metered methods and the degree of instrumentation coverage.
One approach used by a number of solutions on the market is to trace the caller-callee chain but this is not scalable in production once the number of execution points metered grows and the chain lengthens – it is not even scalable in the management console without giving the user RSI.
Fortunately our coverage is severely refined so we can easily track the dependencies (in the small real-estate afforded by this blog) by enabling our optional track probes provider via a system property.
Now lets try another approach this time modeling activities performed by the Terrastore Server as meters themselves. This is much more efficient & scalable when dealing with lots of indirect callers and callees in a chain because we do not have to record (track) the paths. No Trace Paths => Production.
The following system properties added to a jxinsight.overrie.config file will register two meters with the resource metering runtime which themselves map to statistical values in the metering model for metered activities. With the meters we are tracking the rate of change of metering associated with another activity – the activity that is nested inside the execution of another metered activity. The other meter in question is the clock.time which is the default value for such mappings.
Note: Metering = Tuple(Probe (Group), Meter)
Below is a filtered metering table view showing 3 meters for the metered HTTP GET activity. We now have a break down of the clock.time in terms of the underlying metered Server activity – get.all.time and get.time.
In the above configuration we mapped the meters to the total statistic but we could in fact use other ones including the frequency of the activity creating an independent KPI for comparison across different releases, environment contexts, runtimes, and hardware.
Below is a sample of a metering model exported using the above configuration. The Count is the execution frequency of the activity (same value across different meters for a given probe)Â and the Total the rate of change of the meter which in this case is a count itself.
In follow-up entries I am going to show how to address some short comings with the above approach including the requirement for the mapped activity to be metered and the need to define the meter mapping in terms of implementation. We will use the AOP SDK to create custom counter extensions.
Custom Extensions: Terrastore Part 1
This is a first in a series of blog entries showing how easy it is to create custom extensions for JXInsight based on its Open API and AOP SDK.
I have chosen to use the open source project, Terrastore, throughout the series which will start with the development of simple non-contextual probes extensions. In future entries we will build on this to create custom meters (counters/resources), contextual named probes, and client-to-server metering integration.
Terrastore is a modern document store which provides advanced scalability and elasticity features without sacrificing consistency.
Today JXInsight can already perform resource metering of a Terrastore server process via its complete coverage probes extension.
But as you can see from the resource metering model below there are a large number of external projects used which might not be of concern to us in monitoring (metering) its behavior in production.
We could apply filtering external to the extension which would limit the coverage of the instrumentation but even then we would be caught up in the noise of its internal implementation which is subject to change across releases or deployment scenarios.
What is unlikely to change is the public contract defined in a number of packages and classes. So after scanning through the Java documentation shipped in the distribution I came up with a number of candidate interfaces for selective instrumentation. These include:
terrastore.server.Server
terrastore.service.QueryService
terrastore.service.UpdateService
terrastore.communication.protocol.Command
terrastore.communication.Node
terrastore.store.Store
terrastore.store.Bucket
I could create a single extension that instruments the implementation classes of all of the above but instead I have decided to create a number of extensions that logically group the above allowing much finer grain deployment.
Lets starts with the server extension. To create an extension in my IntelliJ project I have created a META-INF/aop.xml file within the src root of my module with the following content.
I then created the following class referenced above in the extends attribute of the concrete-aspect element. The class that this abstract class extends (made concrete by the scope defined in the aop.xml file) is referenced by including the jxinsight-ext-aj.jar which has a dependency on jxinsight-ext-api.jar.
Then I simply replace the complete coverage extension with a jar that packages up both the META-INF/aop.xml file and the compiled class.
Here is a revised resource metering based on this new deployment. It is important to note that whilst we defined the instrumentation points in terms of interfaces the actual metering model reports in terms of implementation.
Creating another probes extension that groups two distinct interfaces is just as straight forward. Define the instrumentation points in a META-INF/aop.xml file and create the corresponding extension class.
The new extension class is a near replicate of the previous extension class except for the return value of the getType() method which can be used by the runtime for filtering and probe naming.
Here is the revised resource metering model with deployment of both jxinsight-ext-aj-terrastore-server-probes.jar and jxinsight-ext-aj-terrastore-store-probes.jar.
Our tracking (caller-callee chain) model is greatly simplified by this deployment – no implementation noise.
Finally here is a complete resource metering model with all 5 custom extensions deployed.
And here is our tracking model simplified for the purpose of monitoring and software execution analysis.
Server -> Service -> Node -> Command -> Store
Too Expensive to Meter. Too Slow to Profile.
In a previous posting titled Too Cheap to Meter. Too Fast to Profile I showed how a strategy based resource metering runtime can reduce overhead (costs) incurred in metering activities with very small execution periods (and resource usage) by dynamically deciding (voting) whether to meter a particular instance of an activity based on its previous execution (consumption) profile.
But what if the activity itself is not fine grain but that the overhead of one particular meter in the set of meters included is extremely costly compared to other meters and relative to the activity’s consumption of the meter? What if you always need to record an activity’s occurrence (count) or track it usage of other meters less costly?
Lets first start with the code which will be metered dynamically at runtime via bytecode instrumentation by our agent.
Below is a sample of a metering model following the execution of the above code. This metering model includes the default clock.time meter which is comparatively cheap to other meters (on my particular os & hardware). Our strategy probes provider was disabled for this run via the following system property.
-Djxinsight.server.probes.strategy.enabled=false
Now lets add in another built-in meter, jmgmt.cpu.time, which pulls the current thread’s cpu consumption from the standard Java 5.0 management interfaces. To do this simply add the following to a jxinsight.override.config file or add it to the command line as a system property.
jxinsight.server.probes.meter.jmgmt.cpu.time.include=true
The introduction of the jmgmt.cpu.time meter has resulted in the wall clock timing jumping from 6.7 seconds to 17.5 seconds. Clearly this meter is not suitable for fine grain metering. We could re-enable our strategy probes provider and eliminate much of the overhead incurred by not metering call*() and spin() methods but then we would also lose the clock.time metering for these methods.
As you can guess JXInsight has a clean & elegant solution to this – meter sampling.
# the interval (ms) between current meter value reads
jxinsight.server.probes.meter.jmgmt.cpu.time.sample=1
Below is a revised metering model with the above property applied. The wall clock time for the execution has dropped significantly down whilst the metering breakdown across both meters is relatively comparable to our initial single meter run which would have been largely cpu bound.
User Level Metering with SavePoints & ChangeSets
Being able to accurately and efficiently determine the resource usage of an interaction with a metered service is a primary requirement of a metering runtime and for the most part having the resource usage itemized by meter is sufficient for most performance & cost management purposes. But what if you wanted to slice-and-dice your resource metering model for the purpose of user/session/job/task tracking with detailed itemization by cost center (typically package, class and method).
Here is a sample application I am going to use to demonstrate the different approaches available today via our resource metering runtime. The application will be instrumented using our dynamic load-time instrumentation agent and our complete coverage (non-technology specific) probes extension library.
Here is a sample of the resource metering model exported following the execution of the above application.
Note: By default only the clock.time meter is enabled but there are many more available via system properties.
Prior to our recent update one could perform a Probes.mark() call to reset cumulative marked metering prior to the measurement point and then dump the current thread resource metering model into snapshot once the measurement period has elapsed.
Here is a sample of the current thread’s resource metering model exported to a snapshot for off-line analysis.
Note: There are no delta values for the main(String[]) method because it has not completed at the point the dump is requested.
Alternatively one could access the current thread’s resource metering model on-line from within the runtime itself by making the similar model access calls made by the dump implementation.
Here is a sample of the output produced by the above code. Effectively the same data available off-line within an exported snapshot.
Note: The black lines represent the probe group. The dark grey lines the metering (one per included meter) associated with a probe group.
Whilst the above is sufficient for a single threaded application test it cannot be used when there are concurrent threads of execution being metered individually as the mark operation is global. In addition it cannot handle the case of multiple nested marked points within a single metered thread execution.
The solution is to efficiently store & compare a threads’s resource metering model between two or more points in time and most importantly stored independent (in terms of lifecycle) of the underlying metering model.
Here is a revised version of the above code this time creating a SavePoint prior to the measurement execution point and then generating a ChangeSet following its completion based on comparison with the current thread’s resource metering model.
Here is a sample of the output produced by the above code.
So how can this be used to track the resource consumption for a particular web session (user)? Well unlike SavePoints, ChangeSets are not tied to one particular thread so you can aggregate the set of changes across each HTTP request via the ChangeSet.add(ChangeSet) method which returns a new ChangeSet instance with both set of changes merged. The last ChangeSet merged can then be stored in a session object (wrapped by a holder object) and made accessible to a user or help desk staff member, possibly via a web page in the web application, for the purpose of user level diagnostics (or billing).
Here is a rough cut of what this would look like if implemented yourself using a filter.
Over the course of a series updates planned for this month we will be expanding on this new capability to deliver out-of-the-box solutions for common cases like user/session metering or custom “billing” http headers. All will be based entirely on dynamic instrumentation requiring only a extension library to be deployed onto the class path.
Too Cheap to Meter. Too Fast to Profile.
I recently came across a very interesting paper on the increasing usage of fine-scale charging made possible by the latest advances in technology in which the authors argue that in many cases such an approach maybe inappropriate.
To some degree I agree with the authors of the paper at least when metering is applied to the physical world and transaction costs are superfluous and static. But I would argue that in the virtual world (of cloud computing) such costs can be greatly mitigated by a dynamic resource metering runtime (used as the basis for a charge back scheme) that is largely strategy based and driven by a policy that factors in the cost of collection and adjusts accordingly.
Note: The resource metering profile of a process, activity or request has many uses and users – billing (or charge back) is just one.
Now being “strategy based” might imply that in some cases charges (measurements) are waived (or understated) but that need not necessarily be the case. For the vast majority of cases there is always some larger encompassing activity (call execution) context were charges (measurements) can be appropriately assigned. What metering strategies do is change how direct the association of charges (measurements) is. In fact a well defined strategy can help one be more effective in determining the most appropriate point in an activity (call execution) for cost reduction (performance improvement).
Hopefully if you managed to stay with me up to this point we can now apply to this to some actual code. The following code resembles one of the problems highlighted in the paper. With the inner() method instrumented with execution timers the transaction (measurement) cost (overhead) is double the actual execution time (cost).
Here is a sample metering (charge) model without any strategies applied. The inherent execution costs (charges) are assigned largely to outer() and inner() – 2.6 secs and 1.6 secs respectively. But how accurate is this and how useful is it when one considers that there is not really anything that can be done to inner() to make it faster. It would be better to focus on the source of the inner() calls.
Lets run the application again this time with our hotspot metering strategy enabled and configured to disable metering of probes which have an average clock time of less than 75 microseconds after the first 1000 metered firings.
The metering model shown below has drastically changed from the previous one above. The majority of the inherent total has now been assigned to outer() following the dynamic disabling of the inner() instrumentation. None of the metering has been discarded. We have just changed the location were costs (charges) are assigned. In doing so we have also greatly reduced the measurement (transaction) overhead – the overall time has dropped down from 4.2 secs to 0.7 secs. But notice that the average of outer() is 66 microseconds. Why was this not disabled? By default the hotspot metering strategy makes its hotspot determination at the 1,000 execution point and does not re-evaluate this again. At that point the average was higher than our hotspot threshold.
Note: The hotspot metering strategy can be configured to use any meter included in the metering model and you can easily create your own meters that represent concepts within the domain model of the application metered.
Here is another metering model with continuous hotspot analysis enabled. At the 2,000 execution mark the outer() method becomes a non-hotspot in terms of our definition. The majority of costs have now moved up to main() which if you look at the code is probably were they should be if one cannot change the number of loops made in the outer() method.
Finally an advanced resource metering runtime can learn not only during a particular process execution but across multiple process executions. Enabling this capability and running the application again we get the metering model below. Both inner() and outer() have been disabled from the start.
It is worth noting that assignment of costs follows how the Hotspot JVM will in-line code.
Squash Java Call Stacks with Tier Points
Today’s web applications have execution paths with huge stack depths  - a trend that is unlikely to be reversed anytime soon. New web application frameworks deliver increased developer productivity at a cost that is to some degree incurred during performance testing/tuning and production problem resolution in the increased complexity involved in the analysis of the software execution being monitored.
Profiling tools based on thread call stack sampling are now pretty much useless due to their excessive overhead and the amount of noise within any derived performance model. Even with standard package filtering options available in most code profilers (call stack sampling or event/execution based) one ends up losing far too much execution context because it is largely an all or nothing choice in terms of each package.
Here is a partial call path for an inbound HTTP request collected from an instance of Apache ESME running in a Tomcat server that has already had standard filtering of web container packages applied to it. More than 30 deep and we have not even seen sight of actual ESME code.
Note: You need to navigate much much deeper  (> 60) to begin to understand what is the underlying domain action(s) performed in the processing of this request.
To address this problem we recently released an update to JXInsight that delivers a highly innovative approach to managing such complexity without losing all of a request’s execution path context as processing moves down through the various web application layers - smart dynamic runtime filtering.
This is achieved by enabling our tierpoint probes provider and defining (via a few system properties) logical tiers mapped to one or more probe groups (typically a package or class though methods are supported). The provider then limits resource metering to execution points at the boundaries of such tiers filtering out all internal method dispatches within a tier.
Here is the same HTTP request as above but this time with our tierpoint provider enabled and tiers defined for Lift, Apache ESME, Apache Derby and Scala. From L.doFilter to E.find we have excluded 60 call frames.
Note: This is performed within the managed runtime which means we can drastically reduce the amount of metering performed whilst still assigning inherent resource metering accordingly across tiers (packages).
Note: The probe names for each element of a track are displayed in shortened form to conserve precious (blog/desktop) real-estate.
Self Optimizing Java Hotspot Detection
I was recently contacted by Solomon Duskis to see whether I would be interested in assisting him with some performance tuning work he was involved with on the JBoss RESTEasy open source project. He was in particular concerned that his findings based on data reported by other profiling tools was inaccurate due to their excessive overhead especially when the entry point operations being tested had sub-millisecond execution times.
Looking at the initial tests results of a prolonged benchmark run, which exhibited relatively (but to be expected) high GC activity and extremely short high frequency method executions, I thought this would be an interesting test for some new features I had been working on in recent software updates to JXInsight specifically adjusted clock time metering and self-optimizing hotspot detection.
Note: Once you start profiling & tuning at the microsecond level you can effectively rule out using any thread cpu based timers as their resolution is very low and their cost of access relatively high.
The first thing I did to understand the software execution behavior of the benchmark was to execute a run with our complete coverage dynamic probes extension, weaving (at load-time) resource metering into all methods within the org.jboss.resteasy.* and org.codehaus.jackson.*. For this run I disabled the default clock.time meter and instead installed our probe.count meter so that I could count the number of metered method executions made directly and indirectly by each package, class and method.
The Count column below tells me the number of probes fired per package – in total approximately 604 million. The Total column tells me how many probes, including those in other packages, were fired and metered during the course of executing the probes listed in the Count column with the last column, Total (I), telling me how many of those probes are actually inherent (directly attributed) to the package.
Note: The Total (I) column can be less than the Count column because probes do not include their own firing within their reported resource metering which also explains why the Total (I) column can be greater than the Count column as in the case or org.codehaus.jackson.* which calls back into org.jboss.resteasy.*. This might be a little bit confusing because we have effectively mapped a resource meter to the actual occurrence (count) of the resource metering but to me that is the beauty of our approach – we can meter anything including the act of metering itself.
The metering data above helped me in very quickly understanding the degree of impact a standard code profiler would have before I even ran such a test. Knowing that most other profilers have an overhead of between 1 and 5 microseconds per method execution I can estimate the total benchmark time from a baseline of 30 seconds. In the best case we are talking about 10 1/2 minutes (634 seconds) though from my own testing the best code profiler took more than 16 minutes (980 seconds). In the worst case more than a hour. With such numbers you are only ever likely to find low hanging fruit along with a large number of red herrings especially when one factors in the amount of GC activity generated.
The above does not really present much of a problem to JXInsight because its metering of any method’s executions is strategy based (In fact we support multiple chained metering strategies). With the default hotspot metering strategy installed the runtime will dynamically disable the instrumentation after a number (default is 1000) of method executions if at that time a specified metering statistic does not exceed a particular threshold. But when a hotspot metering threshold is defined as low as 5 microseconds as in the case of my own benchmarking even the initial number of warmup metered method executions before the hotspot determination is made can have some degree of impact on the actual test run timing and the resulting hotspot determination (especially callers which can be incorrectly labeled hotspots).
Previously the solution was to generate and deploy a new hotspot instrumentation extension after and before each repeat of the benchmark test (see Instrument->Measure->Refine->Repeat). In JXInsight 5.7.41 we offered up another option that can be enabled to defer the generation of extensions until much later in the test cycle by automatically persisting (and reloading) the hotspot metering analysis across consecutive benchmark test runs allowing the runtime to immediately upon initialization disable non-hotspots before their first firing whilst refining the previous hotspot candidate list. This is all done without any intervention by the performance engineer. The performance engineer just keeps re-running the benchmark until the test times and number of hotspots remains constant across runs. The hotspot metering runtime continually builds on-top of what it previously determined – self optimizing.
Here is the output of 13 benchmark test runs I executed in a matter of a few minutes.
Note: For the 9th test run I changed the hotspot metering configuration (whilst still reusing the previous hotspot analysis) to use the inherent min statistic to get rid pass thru elements within a call chain.
Note: For the 13th test run I used a hotspot instrumentation extension generated from within the console based on the metering model collected from the 12th test run removing completely all instrumentation from non-hotspot methods and disabling the metering strategy probes provider extension.
Note: The benchmark duration is measured in seconds and the metering threshold in microseconds.
Update: In our latest internal build, 5.7.42, which includes the ability to enable continuous hotspot performance analysis the benchmark time has now dropped down under 33 seconds.
Update: JXInsight 5.7.42 has been released with continuous hotspot metering analysis following an initial hotspot classification allowing for reclassification and subsequent disablement of associated instrumentation in the event of changes in resource usage.
Near to the end of the testing we were still metering 10 millions calls and incurring only a 5 second slow down compared with a competitors best time of 950 seconds. Here is the breakdown of the resource metering by the main top packages for 12th test run prior to the generation of the hotspot instrumentation extension.
Note: The clock.adj.time meter was used instead of the clock.time meter to discount gc activity in the hotspot metering analysis.

Looking at the inherent clock time averages for hotspot candidates (a partial list is show below) we can see that there was still room for further reduction (in overhead) if we increased the hotspot metering warmup count or instead this was continuously re-assessed during the course of an individual test execution (coming soon).

In a follow-up to this entry I am going to compare the hotspot results from a number of tools showing how surprisingly different they can be beyond the low hanging fruit cases and how widely inaccurate they are no matter what profiling technique is used – execution/event based or call stack sampling. Free Advice != Good Advice.























































