PT&E Processes, Tools and Techniques : May 2015

Monday, May 25, 2015

Common performance test metrics

There are two types of application performance test metrics –

1) Client side metrics – These are the metrics which we can get without installing any software or tool on the server. These metrics tell us about the degradation of user experience with changing load.

Some of the common client side metrics are – Number of Vusers, transactions per second, transaction response time, throughput, errors, and hits per second

2) Server side metrics – These are the metrics which are collected from servers. These need some kind of software or tool to be installed on the server to collect these, or other way is to directly log into server to find these metrics. Off course there are many types of servers (Windows, Unix , AIX) and we need to know how to collect these if we are directly logging and try to collect these.

Some of the common server side metrics are – CPU utilization, Memory Utilization and % Disk time.

There are Network resource metrics too which cannot be divided in upper two categories. These metrics help us know the performance of the network and any performance issues because of those.

Some of the network resource metrics are – Network latency, Network roundtrip and Data transfer.

Explanation of metrics:

Number of VUsers – Number of virtual users which were running during the performance test time period.

Transactions per second – Number of completed transactions during the time period of performance test. This can be measured on individual transaction level or total of whole test.

Transaction response time – It is the time taken by transaction from being initiated till the first byte of its response is received.

When a transaction is initiated it goes through number of layers in the application architecture.

For example it is possible that a request (transaction) in a 3-tier application follows the following path.

Typically in all the performance test result reports we will see Average and 90^th percentile response times.

Average transaction response times – Average of all the response times for a particular transaction for a specified time period.

Example if we have 5 transactions taking 2,3,2,3 and 5 seconds Average transaction response time will be = (2+3+2+3+5)/5 =3 seconds.

90^th Percentile transaction response times – This is the 90^th value of the response time if we arrange response times in increasing order. What this means is that 90 percent of the transactions took less time than this value.

Example if we have 5 transactions taking 2,3,2,3,1,5,4,2,4 and 6 seconds. We need to arrange these values in increasing order first 1,2,2,23,3,4,4,5,6 from here we got 5 seconds as the 90^th percentile value.

Hits per second – Hits per second is the number of hits made on web servers per second of the test. Hits are different than a page request since a page contains many resources like image, videos or any other files. So a hit means number of resources asked from the server at a given time. So if in one request there are 3 resources (1 image, 1 video and 1 graphics) it means there are 3 hits on the web server.

Throughput – This metrics represent the number of byte received by client from a webserver in a particular unit of time. Throughput represents the load which is getting generated on the webserver because of hits generated by client on it.

CPU utilization – Average utilization of all the processors in the system during the test period. Or Average utilization of any particular processor during the test time.

On Windows platform – We can use Perfmon monitors to look into CPU utilization.

On UNIX platform – We can use VMSTAT command to look into CPU utilization.

Processor counters can be divided in following two types:

1) % User time – Time processor spent in user mode code processing.

2) % Privileged time – Time processor spent in kernel mode processing.

Memory Utilization – Average memory consumed during the performance test on the servers.

There are many memory metrics as following:

Available kilobytes – Average of Memory in kilobytes available during the performance test. There may be differenced in units like MBs or Bytes in some cases, only difference in those cases is the unit and level of detail available.

Pages/Sec – Number of virtual pages which are read or written per second. We can also derive from this metrics the amount of data moved to and fro from RAM to Disk per second. To do this just multiply the Pages/Sec number with the size of page (4KB on most machines).

% Disk time - This metrics can be calculated by multiplying “Average Disk Queue Length” counter with 100. Due to this reason we can see % disk time value more that 100% sometimes, the case when Average Disk Queue Length is more than 1.

% Disk time can be measured for both disk read called % Disk read time and disk write called % Disk write time.

% Disk idle time – Provides the time during which disk has no requests to process from operating system. 0 represents disk was always busy while 100 states that disk was always idle.

There are many other disk metrics like Disk transfers/second, Disk bytes/second, Average Disk bytes/transfer, Average Disk Seconds/transfer etc. These have very clear nomenclature hence I am not explaining these.

Errors – As the name suggests these are the error messages or exceptions which are thrown by application while it is under test. There are many kind of errors from application side but there can be some errors due to our scripts or scenarios. We need to check why are these errors thrown while executing a test and also need to keep an eye on errors which increase with load, because most of the performance issues creep in when we increase load.

Network latency – How much time it takes for a packet of data to get from one designated point to another.

Latency can be due to many factors like –

Propagation – time to travel one packet of data to travel from one point to other. So data from India to China will take less time than data send from India to US (if off course medium is direct).

Transmission – every type of network medium has a different transmission delay than other.

Hop processing – There may be some processing time taken by network devices like routers, bridges etc.

Network round trip – This is a kind of latency measure where we measure latency from one point to another and back.

Network utilization – This metrics tells us about how loaded our network is. If this value is high (close to 100%) it shows we have a network congestion.

Data transfer – Bytes received/sec, Bytes sent/sec, Bytes total/sec – These metrics tell us about the amount of data getting thrown on network per second. Nomenclature is self-explanatory.

Friday, May 22, 2015

Performance bottleneck indicators at different Application layers

When we execute a performance test against an application there are different performance bottlenecks specific to the different layers of the application.

Following diagram show what all issues we can see at any specific layer in a three tier application architecture.

What is the root cause for these issues? We may need to drill down further and find that.

There may be some indicators which may appear at any layer but may be because of issues further down the execution path. For example “High time out errors” at web server layer may be because of slow responding service call to the application layer. Further there may be some request which may need services of any slow responding external system and this also may cause timeout errors. We should apply our judgment and logic while trying to fix responsibilities for the issue at any layer. I have used Introscope and dynaTrace for analyzing these performance issues and those are very helpful in diagnosing these.

How can we drill down to the root cause of these issues? I will try to explain that in next post.

Oracle AWR - From performance testing perspective

To analyze the AWR reports we need to pull reports:

1) For both good and bad times i.e. we need reports when database was behaving bad and also when it was good.

2) Pull AWR reports only for the timelines for which it is required. For example when we are analyzing a performance bottleneck just pull the report for only that time when that bottleneck appeared.

Few terms which we need to be aware of:

AWR – Automatic workload repository.

At regular intervals, the Oracle Database makes a snapshot of all of its vital statistics and workload information and stores them in the AWR.

Oracle RAC – Oracle real applications clusters.

To understand this we first need to understand Oracle database structure.

Suppose we have only 2 layers one is application layer consisting of all the application which uses database services and second layer is Oracle data base layer as shown below.

Here different applications are directly interacting with the database layer for database operations.

Now Oracle database layers consists of two items

1) Files (This is the actual database)

2) Processes (also called instances)

Now in a Non RAC environment one software instance is talking to the oracle database. As shown above.

But is a RAC environment two or more software instance talk to single database. As shown below.

3) Hard parse and soft parse

Hard parse – Whenever a new SQL statement is passed to Oracle data base which is not there in shared pool, it perform number of steps to execute it. The steps are Load the SQL code into RAM -> Parse the statement for syntax -> Semantic parse -> Transform the query into simpler one -> Optimize the query -> Create a executable file -> retrieve rows.

Soft parse – In this case statement received by oracle is already present is shared pool. So there is no need to load the same into RAM.

Now to the point.

Sections to look in AWR report:

1) LOAD Profile

2) Top 5 Timed Foreground Events

3) SQL Statistics: These statistics are easy to understand. Nomenclature of each section is pretty much self explanatory.

We can click on any SQL id and get details of the particular query which was executed.

4) Operating system statistics details

5) Time model statistics

These provide the details about where the processing time was spent.