OpenCore 6.0 Beta Released
With our OpenCore 6.0 (resource metering & metric monitoring) public beta out the door this week you will now have the opportunity to try out for yourself many of the concepts and code examples I have created on this blog over the last year. In the future when posting articles with code snippets I will also make available the src and config files. Hopefully this will make it easier for everyone to follow this blog.
Note: OpenCore is the runtime and framework on which we are building our cloud metering as a service – Costicity.
Please visit the OpenCore site, download the beta, review the HelloWorld example, read up on the configuration guides, explore the Open API and try out the API samples.
Call Path Metering & Runtime Governance
Following on from an application performance management and resource metering workshop in India last week I was asked by the customer’s architects to show how one could use metering to record the roundtrip time of a message that passes through message queues (channels) processed by one or more threads (contexts) across one or more processes (runtimes).
The customer was already using our resource metering runtime to record the response time (clock.time) of  HTTP requests but what they really wanted in addition was to determine at runtime and within the execution itself the time interval between the request entering and the resulting message being posted to queue. This time should not include the processing time elapsed as the request call stack was unwound and the request returned to the client.
Here is a timeline graphic depicting a hugely simplified call sequence along with the measurement intervals and the call stack as the request is processed and returned.
I am going to use the following mock Java application to demonstrate one or two of the many ways to determine the path time using our resource metering runtime Open API.
Enabling the optional billing probes provider and collecting the resulting metering information in our management console I can pretty quickly determine the path time via the “Value (E)” column – 2 .002 seconds.
Though billing history can be accessed from within a metered runtime via the Open API, billing entries are only published following completion of a fired and metered probe. I needed to see the incomplete metering history at the point of interaction with the Channel. A solution would be to enable the optional stack probes provider and access the last meter reading recorded by the probes at each end of the metered probe stack. The following shows how this can be done.
Note: It is not necessary to add Open API calls directly into the method. It can be injected via our AOP SDK or using our optional event probes provider and registering a listener. I used the Open API here to simplify & clarify things (I hope) by showing what would typically be injected or executed at runtime.
This same approach can be used to provide runtime resource governance of executing code, similar to how Salesforce.com manages Apex based applications using Governors, as meters can be mapped to any thread specific counters tracking resource consumption and cost in real-time and at the point of each interaction direct or indirectly.
Below I continuously compare the current meter measurement with the last meter reading for the stack root probe checking to see whether a defined request time threshold has elapsed.
In a follow-up I will show how we can tie up all the path times across threads and runtimes.
Business Transaction Management to Business Transaction Metering
Recently there has been a trend for Application Performance Management (APM) vendors to overnight rebrand their offering as a “Business Transaction Management” (BTM) solution.
One reason for such a move is that there is generally less instrumentation performed hence any excessive overhead of the original APM solution can be mitigated.
Whilst it is somewhat gratifying to see many of the innovations we pioneered in the early years of JXInsight, such as distributed contextual trace and transaction analysis for the Java EE and CORBA platforms, being promoted by competitors I think their time has come and passed.
The future of performance management whether it is from a technical or business perspective is best served by a solution that natively supports many of the concepts being promoted including the measurement of business value, the inclusion of business context in activity analysis and the demarcation of transactions & activities and their resource consumption.
Note: I am ignoring the distributed nature of business transaction processing here because that has been in many APM products for a very long time.
When the term “business  value” is mentioned in related marketing literature I straight away see this as just another meter within our activity based costing & metering (ABC/M) system. And understandably so because ABC/M can server as the foundation for effective performance management of any type of service from an IT operations and business perspective.
To demonstrate this I have created a mock application in Java which I am going to instrument at runtime (class load-time) with our OpenCore Probes technology. The following code setups and executes the business transaction which I am going to use throughout this article.
Here is the implementation of the Shop.order(Order) method which represents our transaction point.
Running the mock application with our default application performance management centric instrumentation results in the following resource metering model display in JXInsight‘s management console. This model includes the metering of activities performed during preparation of the business transaction as well those performed in executing business transaction.
To capture the “business value” of transaction I am going to create and update a thread specific context counter, order.value, whenever Store.order(OrderItem) is called.
To have the above counter management code weaved into the runtime I have created the following META-INF/aop.xml file which I have packaged up into an extension jar along with the aspect class.
To map our counter to a meter within our resource metering runtime I have added the following system property to a jxinsight.override.config file.
Running the mock application again with our new extension jar and configuration change produces the metering model shown below. We can now measure the execution of the software from both a business and technical perspective.
Note: Each product ordered was priced at 100 monetary value units.
In our metering runtime activities can be mapped to code execution constructs or can take on the value of some aspect of the code execution context. Lets add some business context to Store.order(OrderItem) calls by firing a probe that includes the name of the shop along with the operation.
We can apply similar probe instrumentation to Shop.order(Order) calls.
Here is a revised META-INF/aop.xml that includes all three aspects.
With these changes the metering model now tracks tracks resource usage from both a technical perspective as well as a business one. Metering of order processing in terms of performance and business value is now represented at the individual shop and store instance levels.
The next thing to do is to demarcate our business transaction so we can see the metering changes within the execution period of Shop.order(Order) call.
The following tables show an individual execution instance of the Shop.order(Order) business transaction along with the metering changes that occurred during its execution within the context of the executing thread.
We can further simply things for business management by not deploying the code level metering extension instead using only the contextual extension created above.
Finally we can configure the management console to render our transactions and activities in a much more business friendly manner by simply dropping in images into the icons directory with the probe name (full or partial) prefix.
No Latency Application Performance Analysis: When wall clock time is simply too slow
During a workshop I gave this week to a team of developers working on a financial trading platform with very low latency requirements I was asked to provide a good example of when to use (thread specific) counter based meters instead of the more natural time based ones in analyzing the performance of systems. Here is example I quickly developed to show that wall clock time does not cut it when one is monitoring and optimizing at the nanosecond level.
The State class in my example is an abstraction of the runtime object state that is read and written during the execution of 5 algorithms. Likewise the Run classes are an abstraction of each algorithm which would be more than likely be implemented across multiple classes and methods.
Looking at each doWork() method implementation it is pretty easy to compare the efficiency of each method but in real life code this would be an impossible challenge especially with different execution paths.
Unfortunately to make a similar assessment with a benchmark using wall clock time requires measuring a very large number of calls to each algorithm due to the extremely short execution interval of a single call. In the example below I needed to execute each algorithm at least 50 million times.
Here is the output running the above a number of times. It seems to confirm what we expected in terms of cost.
Now if the actual doWork() methods were instrumented by an application performance management product we would not see the same performance ratios because even requesting the clock time twice will cost between 100 to 150 nanoseconds and thats without the product doing any actual processing and data collection. Comparing algorithm One with algorithm Two we would have 114 (100 + 14) versus 128 (100 + 28).
Why not instead focus on the key performance indicators? In our extreme low latency example there are two – field.read and field.write. Here is the configuration I used to map such counters to meters.
Note: OpenCore and JXInsight ship with a large number of counter instrumentation extension libraries that need only be added to the classpath of a managed & metered runtime.
Running the application again this time with our activity based resource metering runtime results in the following snapshot display in our management console. Importantly we only needed to execute each algorithm once.
To combine these into one meter to further simplify our analysis I enabled our unit.cost meter and added a cost to each counter based meter.
Here is a probes metering snapshot with both configurations merged. Simple and clear. Whats cool is that this will work no matter how dispersed the actual field read access and write access are across the code base analyzed. From micro-benchmarking to nano-benchmarking!!!
Architectural Enforcement with OpenCore Probes
A while back I promised Mattias Severson over at Jayway that I would show how to use our OpenCore activity based resource metering runtime to support the dynamic enforcement of certain architectural rules for example ensuring that execution within a particular layer is always within the execution scope of one or more stacked layers.
Lets start with a simple probes plugin factory which registers an event listener with the resource metering runtime.
Below is an implementation of a probes event listener I quickly created today to enforce the layered execution rule that Mattias had detailed in his blog entry which is that code within the com.acme.dao package must be called from within the execution scope of code within the com.acme.service package which itself must execute within the scope of code residing in the com.acme.gui package.
Here are the system properties to be added to a jxinsight.override.config file to enable event listening and to register the plugin listed above.
The finally step simply involves running up the application with our OpenCore agents and libraries as detailed here.
Note: We are looking at packaging up similar functionality as a built-in probes provider which would be driven solely by configuration and implemented much more efficiently within the metering runtime.
Note: With our JRuby integration it is possible to write such rules in Java which enforce similar constraints on Ruby codebases.
Costicity.com – A Blueprint for Cloud Service Metering & Monitoring
I have posted an entry on our company blog discussing our Costicity.com project, an activity based costing & metering service for the grid, cloud and enterprise.
Costicity is light years ahead of any recent initiatives in the application performance management (APM) space including associated areas such as business transaction management (BTM) and business activity management (BAM). Its an attempt to unify multiple management domains and to eliminate the need for separate models and tooling for areas that have overlapping management concerns such as those just listed.
Its ambitious and bold but would I have it any other way? No.
Custom Resource Metering Strategies
It has been nearly 2 1/2 years since we introduced the first production grade Java monitoring solution based on an adaptive resource metering engine driven by one or more chained strategies.
Note: Out of the box our default strategy, hotspot, acts very much Java JIT compilers using dynamic runtime metering information to determine whether to continue metering a fired probe which is typically a method though not it is not restricted to this.
Since our initial release which shipped with a number of built-in strategies we have continued to develop new strategies that restrict measurement and collection to particular events or runtime states. Today the list of strategies includes:
burst, busy, busythread, checkpoint, delay, dynamic, exclude, frequency, highmemory, hotspot, include, initial, interval, random, sample, warmup
As stated above the resource metering runtime supports the chaining of multiple strategies allowing our customers to create elaborate new composite strategies. But sometimes that is not enough to help filter down measurement to some peculiar runtime behavior within an application. In such cases you can simply create your very own implementation of  ProbesStrategyFactory and ProbesStrategy interfaces.
Here is example of a custom strategy I have created that limits metering to only one at a time on a per probe name basis. The associated method will still execute but only a subset of the firings will be metered (measured). This strategy uses the queue probes provider we recently released and blogged about.
Note: We might use such a strategy in a relatively stable production environment electing to discard multiple executions of a method if we are currently metering one particular occurrence not yet completed.
Note: The strategy whilst implemented in Java can be used in metering JRuby/Ruby and Jython/Python applications with the Probes.Name argument representing an execution construct within these languages.
Here are the system properties which should be added to a jxinsight.override.config file to install the custom metering strategy within the runtime and enable the required queue probes provider.
Below is a sample of the output produced running the application listed in “Metered Software Service Queues” with this configuration. Because the strategy probes provider is layered on top of the queue probes provider when our custom metering strategy returns false for a particular probe firing no metering takes place and hence it does not appear in its associated queue. The maximum size of a queue is effectively limited to 1.
If you are interested in finding out more then please check out our strategy guides on our OpenCore website.
The Java Application Performance Management Vendor Showdown
It seems like every day sees a new software company enter the Java application performance management space claiming to have solved all the problems of legacy application performance management solutions with what is in essence the very same approach, a combination of call (path) tracing and call stack sampling, both of which don’t scale in production as highlighted here, here, here and here. Unfortunately by the time a customer truly understands this it is far too late to reverse a decision without losing face or what data of little value these solutions collect.
Such is the power of marketing and a good sales person.
Whilst it might appear a formidable task to attempt a feature wise comparison of all products even when they are essentially the same in terms of instrumentation, measurement, and collection there is one claim that most of these vendors make which can be validated and verified. And that claim is “low overhead”. So what does “low overhead” actually mean. Well for the most part this is the % of overhead that is added to average response time and/or removed from transaction throughput. Generally this is quoted as between 1% and 5%. Unfortunately it would be unlikely that anyone could hold a vendor to such fictitious claims because of a large number of “depend clauses” highlighted in any technical discussions with the vendors engineers.
There are a number of assumptions made by a vendor but the following are most common:
1. We assume your application request processing time is sufficiently high to dwarf our significant overhead.
We assume you have a slow performing database backend.
2. We assume that instrumentation is applied to a very limited section of the code source to lessen the impact of our significant overhead.
We assume that you already know your performance hotspots.
3. We assume that there is large amount of under utilized processing capacity to offload our significant overhead.
We assume that you will not notice tricks used to hide our overhead.
4. We assume that it is impossible to realistically and reliably measure our significant overhead.
We assume that you know little about performance engineering.
And my favorite is the restriction within a vendors software license on the publication of benchmark results.
We assume that you blindly accept our claims – unquestionably.
There are some valid reasons why a vendor might object to the use of standard benchmark as it might not be at all representative of a customers workload and software execution behavior. Which is why we have carefully designed a number of Java micro-benchmarks that target exclusively the actual overhead of the products deriving a unit (overhead) cost for various aspects of the runtime in terms of instrumentation, measurement, and data collection. These tests measure the unit cost in terms of the software execution model (fixed execution costs) and the system execution model (variable execution costs). With this information we can make a bold claim like the following.

Now I did say “claim” which is something I have not been entirely comfortable with as a complete official vendor unit cost comparison table would have much more meaning and weight. I had hoped following our announcement that we would have one or two vendors refute this and request an official benchmark shootout but unfortunately it appears most vendors already know the answer or are unwilling to enter into a competition in case they lose face (not just their customers).
At the end of last year I decided it was time to take the fight to them and challenge them to prove their “low overhead” claims using our unit cost approach. I contacted a number of companies repeatedly requesting either access to their software or for them to publish their unit costs per our micro-benchmarks. This list includes the following:
CA Wily (Introscope) aka Introscrap 1.0
Oracle (Oracle Enterprise Manager – Oracle Application Diagnostics for Java)
dynaTrace (dynaTrace) aka dynaCopy
Compuware (Vantage for Java)
NewRelic (RPM for Java) aka Introscrap 2.0
AppDynamics (AppDynamics Lite)
All rejected, refused or ignored my requests.
One new vendor (AppDynamics) even went as far as to change their software license agreement immediately after I had downloaded, tested and invalidated their “low overhead” claims. Then sent me emails trying to retrofit the changes to the original agreement. Fortunately I had saved the original agreement at the time of my download.
I doubt this poor standard practice amongst application performance management vendors is likely to change in the near future unless customers start demanding that the listing of a random percentage between 0 and 5 be replaced with actual unit costs which can be used as a guide in determining the suitability of a solution to a particular workload within certain response time limits and at a specified level of coverage (risk mgmt).
In future entries I will provide further information on the micro-benchmarks we use which show JXInsight and OpenCore to be hundreds if not thousands of times more efficient at a unit cost level in some very important cases.
Metered Software Service Queues
Following on from a capacity planning workshop I attended last week given by Dr. Neil J. Gunther we have released an update to JINSPIRED’s OpenCore resource metering & metrics runtime which includes a new optional probes provider extension, queue, that delivers runtime analysis of active and concurrent metered thread workload at particular service probe points and their associated hierarchical metering groups – both at the process and thread level.
Note: When I enrolled in Neils course I was hoping that by immersing myself in 5 days of discussions around capacity modeling and analysis I would have a moment of brilliance to help me see how best to integrate capacity into our metering model which already serves as a management model for performance and cost in the cloud. Well I had many such moments both during and after the class one of which has already come to fruition.
To best illustrate the benefits of this unique enhancement to our resource metering runtime I have created a small simple sample application. The application has a com.acme.Server class which creates a number of threads each of which uses a com.acme.Runner to dispatch work to a com.acme.Service with random time intervals between each work submission.
Here is the com.acme.Delay class used to simulate work and think time.
The com.acme.Service class uses the com.acme.Delay class to simulate variation in service times.
The com.acme.Runner class is the workload generator which uses the com.acme.Delay class to simulate variable think time between calls to the com.acme.Service class.
Lets explore the monitor() method in the com.acme.Server class which uses our OpenCore Open API to check on the status of Probes.Queues which in our application can represent a Java Package, Class or Method.
The first call creates a Probes.Name for the service probe point we wish to monitor in terms of metered workload queuing. The next call looks up the metering Probes.Group associated with the Probes.Name. Then later on a Probes.Queue associated with the Probes.Group is obtained and its state repeatedly read and printed out in a while loop.
Below is a sample of the output in running the above application with OpenCore’s enhanced for production dynamic load-time instrumentation agent and extension libraries. The queue.size represents the number of threads (con)currently executing the metered com.acme.Service.doWork() method. The queue.max represents the maximum degree of concurrent thread workload that has occurred in the com.acme.Service.doWork() method and the queue.count the cumulative number of com.acme.Service.doWork() calls that have been made some of which have not yet completed.
Now what is really cool about our innovative approach is that we can amalgamate queues based on the hierarchical namespace of the named probes (i.e. packages, class, method) inserted into the byte code at load-time. So instead of com.acme.Service.doWork lets lookup the metering Probes.Group for com.acme.
Here is a new sample of the output when running the application with the above change. You will notice that nothing changes across each monitor output and that 11 is listed for queue.size, queue.max, and queue.count.
So why 11? Well previously we only reported on thread activity flowing through com.acme.Service.doWork() method now with com.acme we also include the main thread which creates the threads and then loops forever in the monitor() method. By the time we get to print out for the first time in monitor all threads are performing work in either the Server, Runner, Service or Delay classes – all of which belong in the com.acme package (and its queue).
And why does the queue.count not change? The queue.count only counts non-reetrant queuing of work which means it will only count the non-terminating entry points methods, Â com.acme.Runner.run() and com.acme.Server.main() methods in our application for the com.acme queue. This is pretty darn cool because it allows us to observe potential workload throttling at Java Package(s) level within an application and then drill down into the detail at Class and Method level when required. We also have much more accurate statistical information to help us in sizing various resources pools consumed by service queue points within an application. Here is a diagram depicting how one might view these amalgamated queues.
This is just the beginning as we plan on providing new interfaces that help isolate software & metered resource behavioral patterns across multiple workload queues and at the various levels of sizing.
If you are interested in learning more about the interface to our OpenCore runtime then please check out our API Samples as well as our Open API docs. There are more samples on our site showing other usages of the Probes.Queue interface. Please check them out.
End User Monitoring to End User Metering
I have posted an entry title, Going Beyond End User Monitoring with End User Metering, on our company blog showing off a pretty cool & effective use of our Probes Open API by one of our customers.

If it can be Measured, it can be Metered
JXInsight makes it extremely easy for cloud/grid/server vendors to augment our activity based resource metering runtime with their very own meters. Simply create a ProbesPlugin, register a named resource Measure, and map a Meter to the Resource.
Lets start with our custom ProbesPluginFactory class which will be loaded by the metering runtime when it’s initialized -Â typically when the first probe is fired by a thread. The create() method, called only once, returns a single instance of ProbesPlugin – itself. The apply() method, called once per metered thread, registers a thread specific resource Measure.
Here is the barebones implementation of our registered resource Measure.
To load our plugin we need to enable the plugin probes provider extension listing plugins to be loaded. Then we need only map a meter to our registered resource. Here are the system properties which need to be added to a jxinsight.override.config file.
Of course returning zero is not going to be of much use in testing so lets create a simplistic mock-up of our resource meter which for interest is based on the thread level metering of other meters included in the runtime. Here is a revised implementation of our Measure.
Here’s the revised implementation of the plug-in apply() method.
In keeping with my cloud washing (machine) storyline I have created a test client class which will be instrumented at load-time by our enhanced (for production) AspectJ runtime and aspect extension libraries.
I added a few additional system properties to enable the cpu.time meter and to exclude both the busy() and pause() methods which were added to simulate work and delay.
Here is the metering model following the completion of a client execution.
I also enabled our optional tracking probes provider extension to show the metered activity trace (track).
Meters versus Metrics
As the newly appointed head of Cloud Metrics & Metering for the Cloud Club (SF) I thought it would be best to discuss some of the differences I see between metering and metrics.
Actually I am based in The Netherlands but I my goal is to attend all meet-ups that touch on my area of expertise.
Lets start with the definitions according to my Mac’s dictionary.
A metric is a system or standard of measurement.
A meter is a device that measures and records the quantity, degree, or rate of something.
And metering is defined as “measure by means of a meter”.
From the definitions above we can see that it is possible for a meter to be a metric but not all metrics are meters. Even when presented as a rate, meters are inherently measurements of an accumulating quantity whereas metrics can measure a value that fluctuates up and down its scale (a gauge).
In general I consider a meter to be an observable value that only ever increases (positively).
Another difference between these two types of measures is in their scope. In systems management metrics are collected, aggregated, and reported at a coarse granularity far above the level of the measurement source/device. Base metric measurement tends to be performed at the OS process level with further aggregation at host and cluster levels performed within management tools. Whereas meters mostly report at the point of usage and in most cases are specific to the execution (call) context. An example would be the current thread cpu time counter or the charge data of a cloud service request returned within in the payload of a response. As stated previously meters can very easily be modeled as metrics. An example of this would be the commonly collected process cpu time metric which is derived from reading the cpu time meter and making process level cost assignments based on thread level accounting (sampling, switching,…).
What makes meters much more powerful than metrics is that because of their reporting scope we can accurately relate growth to one or more (chained) software activities, assigning cost (growth), both direct and indirect, to such activities and their executing context (thread, request, entry point, user, module, code,…). Metering allows is to understand cause-and-effect from multiple contextual perspectives. This capability is generally not available with metrics, though in some ways this significant deficiency is (somewhat poorly) addressed via statistical correlation and collecting a vast amount of metrics related to context. But if metering (tuple: activity + meter + cost assignment) is itself measurable can’t we still have metrics that have much more relevance and can be charted alongside other metrics on our timeline charts? Yes indeed, you can have your cake and eat it. That said advanced metering systems like the one I work on can augment the basic metering with advanced data collection techniques (tracking, transactions, paths,…) which cannot be easily represented as a metric.
To create a much more effective systems management model we need to move away from our over reliance on legacy process level metrics devoid of application activity context. This is crucial when one considers the ease and scale at which we can today create instances and processes in the cloud. Even if our legacy system management tools could scale (which sadly they cannot) as easily as our applications (the jury is still out on that one) our thinking and models cannot. We need to go beyond OS specific container constructs and see the essence of our software – activities and resources.
Only metric models based largely on activity based metering provide a solution for our needs – today and tomorrow, outside and inside the cloud.
If you want to see how both metering and metrics models can be combined then please check-out our OpenCore HelloWorld example.
Cloud Costs that do not Cost
When one talks about cloud computing costs it is natural to think in terms of billing but in designing our activity based costing (ABC) solution we elected to make costing applicable to the management & optimization of service delivery from one or more management perspectives such as (application) performance, (financial) cost, (resource) capacity, (commercial) value and (service) quality.
For abstraction purposes we choose instead to represent cost as a Meter within our Open API and runtime which in turn can be mapped to a Resource (pull based) or Counter (push based).
It would be much more accurate to refer to our solution as the first activity based metering (ABM) solution for cloud and non-cloud environments.
Meters can represent costs but cost itself does not necessarily need to represent an actual financial liability or charge. A cost (i.e. meter) can represent the amount of effort, loss or sacrifice incurred in achieving or obtaining something. In the performance management domain our built-in clock.time meter can be viewed as a cost as it represents to some degree the effort or loss (in terms of time, efficiency and capacity) in servicing an application request.
One could also view wall clock time as a cost to the user – time is money.
It is worth noting that a single meter can have different cost usage & valuation across management domains. Coming back to our clock.time meter it can also be used to model the lease cost of a worker thread or (container) process for service capacity management purposes.
When meters do represent an expense they need not represent actuals but more so approximations. This is especially true when the usage of a particular metered resource has variable unit pricing and/or rate plans which make it impossible to determine the exact expense at the point and time of consumption. This is largely the case for most of Amazon AWS cloud services which incidentally offer very little in the way of activity (cause) analysis. The approximation of actual costs does not invalidate the metering model because in such cases the meters are being used to represent the cost drivers – that which can be managed external to the resource pricing by the provider.
Precision is not synonymous with accuracy.
For the most part I like to view cost related (or linked) meters as key performance indicators of a process I am trying to optimize. Increases in such meters will generally equate to an increase in cost (direct or indirect) across one or more management domains though the degree at which costs increases may vary. My tendency to refer to meters as costs is because one is always forced to make trade-offs in designs which have implications for consumption and utilization rates of resources (How fast and at what cost?). Treating them uniformly as costs by deriving additional weighted unit costs meters (which by the way we fully support in our runtime) enables me to simplify my analysis of such trade-offs. For example I occasionally create a cost meter based on different (weighted) unit cost rates assigned to clock.time and cpu.time when benchmarking alternative request processing execution paths. Here I am trading response time processing with cpu usage which is required when there are resource capacity limits (or in the cloud financial constraints).















































