Friday, May 22, 2009

Feature Requests for BEZVision

I am sorry if this post appears cryptic for some users. Its my wishlist for a product I am currently working on. I have already posted details of the product here.
Some of these features need more elaboration and I will dedicate separate posts to them, updating this post with the links.
A. Workload Characterization
Workload rules for app tier
This is a feature conspicuous by its absence. We cannot create workload rules for app tier. Although we do not have UPMM in the app tier, we should still be able to create rules based on page urls, component names etc. We do save this information in component stats table so it should be easy to be able to define rules based on it.
Workload Clustering
The problem with aggregating data on the basis of UPMM is that very different requests (in terms of cpu and io) can be grouped together. If the requests have widely different response times, their average response time will not be a true measure. So we basically need to group requests together that are close to each other in response time, cpu time and/or io. This can be done by k-means clustering of raw data. We will have to store a percentile field with each request and characterize on the basis of that. Its somewhat already done in model service where requests are distributed in percentiles.
Workload Recommendation
This is in part inspired from Teradata Workload Analyzer. We should be able to analyze collected data, find patterns in them by using k-means clustering and then suggest a confingurable number of workload rules to the user.
More refined definition of workload rules. (Classifier function DSL)
SQL Server has a nice way of defining workload rules by creating a classifier function. We should have a similar way of definiing a lassifier function in a custom DSL that looks something like
if (stats.logicalReads > 1000) then place in 'High IO Workload'
else if (stats.cpuTime > 3600) then place in 'High CPU Workload'
Workloads based on types of requests. (INSERT, UPDATE)
This is self explanatory. We should be able to define workload on the basis of requests types as the demands/characteristics of select, insert, update and delete statements are all different.
Calculate Cost of each workload (a function of CPU and IO)
All RDBMS (oracle, sql server, teradata) have an internal cost algorithm that is used to compute the cost of a query. We should be able to have some customizable algorithm to compute the cost of requests so tham we are able to characterize requests on the basis of that.
Workload Priority while data collection
Again self-explanatory. We do have a priority field in workload stats, but its not populated while data collection (its only set while creating plans for a prediction). We need a way to get the workload priority while collecting data from database
Importing workload rules from monitored database
Teradata has a workload manager, SQL Server has a resource governor and Oracle has a resource manager that allow the user to define workload rules. We need a way to import those workload rules into BEZVision.
Asynchronous workload data collection
Asynchronous workload is the workload not associated with any request. We need a standard way to create asynchronous workload while data collection and in model service.
Workload Type (Batch, DSS, OLTP)
I wonder why we do not define the workload type as the queuing network for batch and OLTP workloads are different from each other.
B. Prediction Change Plans
In App- tier : Implement SSL Plan
Implementing SSL increases the response time of the page. This should allow the user to select an encryption algorithm and other SSL parameters and foresee the impact of implementing SSL.
Change DBMS instance type change plan
This plan is conspicuous by its absence, as we have node type change plan and JVM configuration change plan. We should also have a DBMS instance type change plan. This will allow the user to test various configurations beforehand. Will also be able to predict the impact of a db upgrade.
Cost of Planning Scenario. Total Cost of Ownership. Relative Cost
C. Analysis & Prediction
Network data collection to use in the alignment of interconnect messages
Besides databases and jvms, the appllication should also be able to analyse and predict network configurations. We should atleast collect some network stats to use in the alignement of interconnect stats.
Collect and analyse memory data
Again memory data is never collected or analysed. We do not need it in prediction engine, but it may be useful to have some sort of anaysis available for memory as well such as that available for storage.
Data collection and Performance Analysis of .Net Framework
The microsoft .net framework is the logical extension after the support for SQL Server. Currently BEZVision only supports Java application servers in the app tier. Supporting a multitier system with .net framework and sql server would be useful. For performance data collection the framework provides a rich set of performance counters that can be read by WMI or the typeperf command.
Availability Analysis and Prediction
We can gather several metrics by analysing collection failures like MTTF, Probability, Up Time, Down time. We will need a heartbeat query with timeout for DB, heartbeat queries for node (may be running out of memory), disk (may be full try checking its space if its below threshhold.), network etc. We can then analyse to data and predict the availability of a system at various points in time. We can also analyse the workload that caused the system (dbms/node/disk) failure. Again this will not involve queuing theory but will be a done in the way described here.
Predict usage patterns (workload/activity prediction): queries. page accesses (collective intelligence)
This is not a related feature but since we are gathering so much user data, we should be able to derive some intelligence from it. e.g. predict db/app usage patterns etc.
Save an analyse query plans
Every database (Oracle, SQL Server and Teradata atleast) save the query plans associated with a query that conatns several useful information. We need a way to extract and anlyse that information.
D. Advice
App tier advice
We can have an app tier advice similar to (http://code.google.com/speed/page-speed/). We can also have an app tier catalog snapshot which will include the list of components (jsp pages, ejbs etc) in the app tier.
Database Configuration Advisor
Need a database configuration advisor that will suggest changes in db configuration based on the workload. It may be extended to a general purpose advisor that runs various predictions with different set of configurations and then advises on the basis of that (its already done as the PE generates recommendations in case of a contention point but it needs to be generalized and automated).
E. Audits
  1. Custom Alert Rules.
  2. Audit Exception Advice.
  3. Custom audits. Define a baseline for every metric. Alert for deviations. Automatic corrective actions.
F. User Interface
  1. Improve data presentation in the UI. Gantt and Kiviat charts.