<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>vPivot &#187; vscsistats</title>
	<atom:link href="http://vpivot.com/tag/vscsistats/feed/" rel="self" type="application/rss+xml" />
	<link>http://vpivot.com</link>
	<description>Scott Drummonds on Virtualization</description>
	<lastBuildDate>Wed, 01 Feb 2012 06:46:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Alternative to DRS</title>
		<link>http://vpivot.com/2010/12/03/alternative-to-drs/</link>
		<comments>http://vpivot.com/2010/12/03/alternative-to-drs/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 03:50:21 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[drs]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[pricing]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vmotion]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=706</guid>
		<description><![CDATA[Now that I am six months removed from VMware, I will admit that we executed poorly in the space of performance management.  I know that there is intense work going on right now in acquisitions, unification of performance management tools, and vCenter improvement through folding in vscsiStats and esxtop data.  But in the area of [...]]]></description>
			<content:encoded><![CDATA[<p>Now that I am six months removed from VMware, I will admit that we executed poorly in the space of performance management.  I know that there is intense work going on right now in acquisitions, unification of performance management tools, and vCenter improvement through folding in vscsiStats and esxtop data.  But in the area of performance reporting and visualization, VMware&#8217;s success has been minimal.  VMware hopes its acquisition of <a href="http://www.alivevm.com/">AliveVM</a> will plug part of this gap but today it is safe to say the field is wide open for VMware&#8217;s partners.</p>
<p>This morning one such partner, <a href="http://www.vmturbo.com/">VMTurbo</a>, gave me a demonstration of their offering in this field.  Their product provides an obvious improvement on vSphere&#8217;s performance visualization capabilities.  But given the state of VMware&#8217;s visualization capabilities virtually any graphical front-end provides an improvement.  But what really set off my imagination were two features I had not seen before:</p>
<ul>
<li>A third-party alternative to DRS.</li>
<li>Cross-cluster resource optimization.</li>
</ul>
<p><span id="more-706"></span>VMTurbo provides a variety of monitoring and analysis capabilities but I want to focus most on optimization, in particular load balancing.  But before describing what VMTurbo has done, I want to point out the economics of competing with VMware&#8217;s DRS.</p>
<p>VMware provides four <a href="http://www.vmware.com/products/vsphere/buy/editions_comparison.html">vSphere editions</a> for its customers.  The cheapest edition that offers DRS is Enterprise at a list price of $2,875USD per socket.  The cheapest edition with vMotion is Standard at $995USD per socket.  There are plenty of cool features that come with upgrading from Standard to Enterprise: DRS, VAAI, Fault Tolerance, Storage vMotion, vShield Zones, and others.  But certainly DRS is one of the most valuable of that list.</p>
<p>By leaving such a big price gap between the cheapest vMotion edition and the cheapest DRS edition, VMware has provided its partners an economic incentive to innovate and provide DRS value to customers at a discount.  VMTurbo may capitalize on this incentive and it would not surprise me if numerous other ISVs are already doing so or soon will.  Once a vendor has built a robust monitoring environment, it is only a clever algorithm away from implementing DRS.  And then a trivial API call away from extending DRS to DPM.</p>
<p>The VMTurbo guys explained that their algorithm uses more resources than just CPU and memory and could therefore be better than DRS.  But I know how much work has gone into VMware&#8217;s memory-and-CPU DRS that I will only believe VMTurbo&#8217;s claims when I see the data.</p>
<p>Another area in which VMTurbo is tinkering is with inter-cluster load balancing.  The demo I received this morning showed a pre-cursor step to datacenter-wide load balancing by modeling the merge of two DRS clusters.  As the <a href="http://vpivot.com/2010/11/29/maximum-hosts-per-cluster/#comments">discussion in my maximum cluster size entry</a> showed, choosing and changing cluster sizes is not easy.  And fluidly moving virtual machines between different clusters is not often possible for a variety of reasons.  But modeling cluster merging is the first step in considering cross-cluster operations.  And I think that there is a huge opportunity in the industry for someone to innovate in datacenter-wide optimization.</p>
<p>I would be curious to see what other vendors are doing with DRS, DPM, or datacenter-wide load balancing.  Can anyone refer me to any ISVs that are trying to crack these difficult problems?</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/12/03/alternative-to-drs/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Performance Troubleshooting Made Simple</title>
		<link>http://vpivot.com/2010/05/10/performance-troubleshooting-made-simple/</link>
		<comments>http://vpivot.com/2010/05/10/performance-troubleshooting-made-simple/#comments</comments>
		<pubDate>Mon, 10 May 2010 13:27:05 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[troubleshooting]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=525</guid>
		<description><![CDATA[I have struggled for years to give VMware&#8217;s customers a framework for diagnosing performance problems. People want a simple system to troubleshoot the unknown sources of poorly performing applications. The best attempt at documenting such a flow is Hal Rosenberg&#8217;s document on vSphere performance troubleshooting. Elegant as it may be, Hal&#8217;s document remains complex for [...]]]></description>
			<content:encoded><![CDATA[<p>I have struggled for years to give VMware&#8217;s customers a framework for diagnosing performance problems.  People want a simple system to troubleshoot the unknown sources of poorly performing applications.  The best attempt at documenting such a flow is <a href="http://communities.vmware.com/docs/DOC-10352">Hal Rosenberg&#8217;s document on vSphere performance troubleshooting</a>. Elegant as it may be, Hal&#8217;s document remains complex for the novice VI administrator.  And it is because that document is so complex that performance people maintain their job security.  <img src='http://vpivot.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />   But in an effort to further obviate my own job, I will try and generalize the troubleshooting flow to add more clarity to the process.</p>
<p><span id="more-525"></span>The first tool in the VI administrator&#8217;s toolbox should always be vCenter.  Through the vSphere client you can use vCenter&#8217;s performance counters to confirm a problem with any of the four resources (storage, CPU, memory, network).  vCenter&#8217;s 20 second sample window impedes its ability to eliminate a resource as a problem.  This is because a three second spike in any resource will be smoothed and missed over the 20 second window.  But when vCenter confirms a sustained resource bottleneck, it is sure to be the performance problem&#8217;s cause.</p>
<p>If vCenter fails to confirm an obvious performance problem, the administrator must next go to more precise, more time-intensive, and more knowledge-intensive tools such as esxtop and vscsiStats.  esxtop takes more skill and time than vCenter but provides better resolution and more visibility into the system.  vscsiStats is the most time-intensive tool and has limits with ESXi hosts but can uncover a world of detail invisible to esxtop and vCenter.</p>
<p>I estimate each tool&#8217;s chance of identifying a random performance problem as follows:</p>
<ul>
<li>vCenter: used in 90% of performance problems</li>
<li>esxtop: used in 9% of problems</li>
<li>vscsiStats: used 0.9% of the time</li>
</ul>
<p>The remaining 0.1% of the time is when you engage your account team or your local VMware performance expert.</p>
<p>Even within each tool&#8217;s usage there is an hierarchy of investigation: storage, CPU, memory and network.  My experience with troubleshooting has informed this decision.  Storage causes the most problems, then CPU, then memory, and lastly (and rarely) network. After each resource level is inspected in vCenter, a repeat of the inspection should occur on esxtop.  Guest tools may be a third option for memory, CPU, and network but vscsiStats should always be consulted if the performance problem persists.</p>
<p>VMware&#8217;s growing array of performance management tools will change this flow somewhat.  AppSpeed, for instance, adds the ability to make very educated guesses about resource bottlenecks based on inside information into the application execution.  Hyperic can provide in-guest process visibility and Ionix ADM will map application interdependenies to focus the investigation.  But, I will abstain from providing best practices on these tools until I have used them more.  In all cases, however, the fundamental relationship of &#8220;easy first, precise later&#8221; remains.</p>
<p>VMware continues to work towards integrating all of these tools into a single view within the vSphere client.  I expect that integration will improve the success rate of the performance layman in troubleshooting these problems.  But I am sure that even into the distant future performance people will find their jobs secure.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/05/10/performance-troubleshooting-made-simple/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Windows Guest Defragmentation, Take Two</title>
		<link>http://vpivot.com/2010/04/14/windows-guest-defragmentation-take-two/</link>
		<comments>http://vpivot.com/2010/04/14/windows-guest-defragmentation-take-two/#comments</comments>
		<pubDate>Wed, 14 Apr 2010 21:33:56 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[vscsistats]]></category>
		<category><![CDATA[windows]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=380</guid>
		<description><![CDATA[I have received questions about guest defragmentation tools for years.  Until today I could only pose theories as to the value of guest defragmentation.  But previous theories spawned new research and one of VMware&#8217;s partners is now putting data behind their argument that file systems in Windows virtual machines require defragmentation.  This partner, Raxco Software, [...]]]></description>
			<content:encoded><![CDATA[<p>I have received questions about guest defragmentation tools for years.  Until today I could only <a href="http://vpivot.com/2010/02/12/windows-guest-defragmentation/">pose theories as to the value of guest defragmentation</a>.  But previous theories spawned new research and one of VMware&#8217;s partners is now putting data behind their argument that file systems in Windows virtual machines require defragmentation.  This partner, Raxco Software, shared early results of this investigation with me.  Raxco used their NTFS defragmentation tool <a href="http://www.perfectdisk.com/products/business-perfectdisk11-vsphere/learn-more">PerfectDisk</a> to evaluate the impact of guest defragmentation on a single virtual machine.</p>
<p><span id="more-380"></span>Before I describe the test and its results, I want to share an important point on guest defragmentation.  Most of the computer literate are aware that <em>file</em> fragmentation&#8211;the separation of logically contiguous pieces of a file&#8211;can hurt storage performance.  But many may not realize that <em>free space</em> fragmentation is as big of an issue.  When free space is fragmented, writes take longer and files are re-fragmented rapidly.  PerfectDisk defragments files and free space and the results below benefit from both of these improvements.</p>
<p>The following steps were used to setup the test environment:</p>
<ol>
<li>A  new virtual machine was constructed with Windows Server 2008 on a single 50 GB virtual disk.  29 GB (58%) of the disk was populated with he OS and miscellaneous user files.  21 GB of free space remained.</li>
<li>They ran a tool that simulated months of user activity by reordering blocks to fragment files and free space.</li>
<li>The fragmented virtual machine was cloned.</li>
<li>PerfectDisk was run on the second virtual machine to produce a defragmented virtual disk.</li>
</ol>
<p>The Raxco team then compared the performance of the fragmented virtual machine with the defragmented one by measuring the installation time of Microsoft SQL Server 2008.  This workload was chosen because it produces a bounded, write-intensive test that can easily be monitored with <a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a>.  Only one virtual machine was running on the host.</p>
<p>Let&#8217;s take a look at a few key graphs.</p>
<div id="attachment_384" class="wp-caption alignnone" style="width: 510px"><a href="http://vpivot.com/wp-content/uploads/2010/04/size.png"><img class="size-full wp-image-384" title="Distribution of IO Sizes" src="http://vpivot.com/wp-content/uploads/2010/04/size.png" alt="Comparing IO size" width="500" /></a><p class="wp-caption-text">The number of IOs during the application install, organized by size.</p></div>
<p>This first histogram shows IO counts by size.  You can see that IO counts for all but the largest bucket have decreased because Perfect disk is reordering files to minimize small IOs and maximize large IOs.  For the storage hardware this means greater efficiency in processing IOs.  But it also means two things to ESX:</p>
<ul>
<li>More host throughput as the fixed HBA queue is now holding larger commands.</li>
<li>Fewer virtual storage stack traversals resulting is lower CPU utilization.</li>
</ul>
<p>However, PerfectDisk not only consolidates small IOs into large IOs, it also makes files logically contiguous as they are seen by the NTFS file system. This means less work for disk controllers when mapping logical to physical clusters on disk.</p>
<p>Next we have the vscsiStats seek distance histogram which shows the shift from random to sequential IO.</p>
<div id="attachment_383" class="wp-caption alignnone" style="width: 510px"><a href="http://vpivot.com/wp-content/uploads/2010/04/seek-distance.png"><img class="size-full wp-image-383" title="Seek Distance" src="http://vpivot.com/wp-content/uploads/2010/04/seek-distance.png" alt="Distance between successive IOs" width="500" /></a><p class="wp-caption-text">The seek distance histogram shows the number of logical blocks between each successive IO.</p></div>
<p>The seek distance histogram shows a clear increase in the number of IOs that were exactly one block after the previous IO.  This pattern, demonstrated in the bucket &#8220;1&#8243; increase, reflects the increased sequential nature of accesses to the defragmented virtual disk.  In this case the fragmented disk access was 15% sequential while the defragmented disk was 30% sequential.</p>
<p>Let us next look at latency.</p>
<div id="attachment_382" class="wp-caption alignnone" style="width: 510px"><a href="http://vpivot.com/wp-content/uploads/2010/04/latency.png"><img class="size-full wp-image-382" title="IO Latency" src="http://vpivot.com/wp-content/uploads/2010/04/latency.png" alt="Number of IOs by Latency" width="500" /></a><p class="wp-caption-text">This histogram counts IOs by latency.</p></div>
<p>The latency histogram shows a decrease in all IOs across the board and a near elimination of IOs that took more than 30,000 microseconds (30 ms).  Those very slow IOs accounted for 15% of all the commands in the fragmented case.  By eliminating the 15% slowest IOs, you can imagine that the total IO performance and application execution time would greatly improve.  That is exactly what happened, as the following data show:</p>
<table id="newspaper-a">
<tbody>
<tr>
<th>Metric</th>
<th>Fragmented Disk</th>
<th>Defragmented Disk</th>
<th>Comment</th>
</tr>
<tr>
<td>Total IOs</td>
<td>166412</td>
<td>105620</td>
<td>A decrease in total IOs is a result of Windows making fewer requests for larger IOs in the defragmented case.</td>
</tr>
<tr>
<td>Mean IO Response Time</td>
<td>58.5 ms</td>
<td>3.5 ms</td>
<td>The best application metric for this test showed a 33% decrease in execution time.</td>
</tr>
<tr>
<td>SQL Server 2008 Install Time</td>
<td>45 minutes</td>
<td>30 minutes</td>
<td>The best application metric for this test showed a 33% reduction in execution time.</td>
</tr>
</tbody>
</table>
<p>Let me repeat one of those amazing data points: the average IO latency dropped from about 55 ms to less than 4 ms. While this is a phenomenal number, the increase depends on characteristics of the storage system.  Since these improvements are configuration dependent, your results may vary considerably.</p>
<p>As Raxco continues its investigation I remain cautiously optimistic of the value of guest defragmentation.  I think the exchange of small IO for large IO is indisputably a Good Thing.  However, virtual environments are very complex and I harbor some concerns about guest defragmentation in virtual environments that must be considered.  For instance:</p>
<ul>
<li>Defragmentation in your virtual machines backed by linked clones may explode those VMs&#8217; VMDKs&#8217; consumption of their VMFS volumes.</li>
<li>The value of increased sequential access in a single virtual machine will decrease some in consolidated environments.  This is because multiple virtual machines&#8217; sequential access gets interleaved at the array, increasing the randomness of the IO from the array&#8217;s point of view.</li>
<li>Guest block reordering may have negative consequences to your array, as <a href="http://vpivot.com/2010/02/12/windows-guest-defragmentation/#comment-359">Vaughn Stewart argued in a comment to my first entry on the subject</a>.</li>
<li>The value of file defragmentation may be limited when applications produce small random block access to data files, as some databases tend to do.</li>
</ul>
<p>Raxco is continuing to investigate guest defragmentation to respond to some of these concerns and provide best practices for PerfectDisk&#8217;s usage.  I will update you as the research continues.</p>
<h2>Test Details</h2>
<p>ESX Server Configuration</p>
<ul>
<li>ESX Version:                                     3.5.0 Update 1</li>
<li>Motherboard:                                 Intel S5000PSL</li>
<li>CPU Type:                                        Intel(R) Xeon(R) CPU E5345  @ 2.33GHz</li>
<li>Number of CPUs:                           2</li>
<li>Cores per CPU:                               4</li>
<li>Logical Processors:                        8</li>
<li>Memory:                                          4 GB</li>
</ul>
<p>Storage Configuration</p>
<ul>
<li>RAID controller:                              Adaptec RAID 3805</li>
<li>Number of Drives:                         4</li>
<li>Drive Type:                                       WD1001FALS 1TB 7200 RPM 32MB Cache</li>
<li>Total Capacity:                                 4.0 TB</li>
<li>Number of LUNS:                          2</li>
<li>LUN 1 RAID level:                           5</li>
<li>LUN 1 Capacity:                               2.00 TB</li>
<li>LUN 1 Partitions:                            1</li>
<li>LUN 1 Name:                                   IOTesting</li>
<li>LUN 2 RAID level:                           5</li>
<li>LUN 2 Capacity:                               744.75 GB</li>
<li>LUN 2 Partitions:                            1</li>
</ul>
<p>Datastore Configuration</p>
<ul>
<li>Number of Datastores:                2</li>
<li>Datastore 1 Name:                        IOTesting</li>
<li>Number of VMs:                            2</li>
<li>Capacity:                                            2.00 TB</li>
<li>Target LUN:                                      LUN 1 (from Storage configuration)</li>
<li>Datastore 2 Name:                        ISO</li>
<li>Number of VMs:                            0</li>
<li>Capacity:                                            744.75 GB</li>
<li>Target LUN:                                      LUN 2 (from Storage configuration above)</li>
</ul>
<p>VM Configuration</p>
<ul>
<li>Number of VMs:                            2</li>
<li>Operating System:                        Windows Server 2008 R2 (64-bit)</li>
<li>Memory:                                           2GB</li>
<li>Number of CPUs:                           2</li>
<li>SCSI Controller:                               LSI Logic (no SCSI bus sharing)</li>
<li>Number of Disks:                           1</li>
<li>Size of Disk:                                      50 GB</li>
<li>Provisioning Type:                         Thick</li>
<li>Backing Datastore:                        IOTesting</li>
<li>Virtual Memory:                             none (pagefile disabled)</li>
<li>Network:                                            disabled</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2010/04/14/windows-guest-defragmentation-take-two/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
		<item>
		<title>vscsiStats for ESXi</title>
		<link>http://vpivot.com/2009/10/21/vscsistats-for-esxi/</link>
		<comments>http://vpivot.com/2009/10/21/vscsistats-for-esxi/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 22:10:22 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxi]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=139</guid>
		<description><![CDATA[You will occasionally need vscsiStats to deepen your understanding of an application&#8217;s storage profile.  This tool is provided with ESX classic but requires configuration and installation for ESXi environments.  VMware is planning to add vscsiStats counters to the vSphere Client UI in a future version but until then you will have to perform the following [...]]]></description>
			<content:encoded><![CDATA[<p>You will occasionally need <a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a> to deepen your understanding of an application&#8217;s storage profile.  This tool is provided with ESX classic but requires configuration and installation for ESXi environments.  VMware is planning to add vscsiStats counters to the vSphere Client UI in a future version but until then you will have to perform the following to enable vscsiStats on ESXi.</p>
<p><span id="more-139"></span></p>
<ol>
<li>Boot ESXi in tech support mode.  See <a href="http://kb.vmware.com/kb/1003677">KB article 1003677</a> for instructions.</li>
<li>Download the right version of vscsiStats using the table below.</li>
<li>Install the right vscsiStats binary using a secure copy tool such as WinSCP or scp.</li>
</ol>
<p>I will provide vscsiStats binaries as requested.  If you do not see the version of vscsiStats that matches your ESXi build, please provide in this article&#8217;s comments your build number (not release number!) and I will add it version of vscsiStats to the repository.</p>
<h2>Currently Posted vscsiStats Binaries</h2>
<p>Again, comment if yours is not listed!</p>
<ul>
<li><a href="http://e-scott.net/share/vscsiStats/esx-110271-3.5.2/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 110271, ESX 3.5 U2</a> (Someone please confirm that this works.  This was a strange build.)</li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-153875-3.5.4/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 153875, ESX 3.5 U4</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-163429-3.5.p14/vscsiStats">Build 163429, ESX 3.5, patch 14</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-164009-4.0.0/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 164009, ESX 4.0</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-171294-4.0.0/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 171294, ESX 4.0</a> (same bits at 164009)</li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-176894-3.5.p16/vscsiStats">Build 176894, ESX 3.5, patch 16</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-184236-3.5.p17/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 184236, ESX 3.5, patch 17</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-199239-3.5.p18/vscsiStats">Build 199239, ESX 3.5, patch 18</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-207095-3.5.5/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 207095, ESX 3.5 U5</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-208167-4.0.1/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 208167, ESX 4.0 U1</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-219382-4.0.p3/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 219382, ESX 4.0, patch 3</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-236512-4.0.p4/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 236512, ESX 4.0, patch 4</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-238493-3.5.p21/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 238493, ESX 3.5, patch 21</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-244038-4.0.p5/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 244038, ESX 4.0, patch 5</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-256968/vscsiStats">Build 256928</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-261974-4.0.2/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 261974, ESX 4.0, U2</a></li>
<li><a href="http://e-scott.net/share/vscsiStats/esx-398348-4.0.3/vscsiStats" onClick="javascript: pageTracker._trackPageview('/downloads/map'); ">Build 398348, ESX 4.0, U3</a></li>
</ul>
<h2>Update, 15 May 2010: Change of SLA</h2>
<p>Everyone may notice that I used to be able to be able to respond to requests for vscsiStats versions in 24 hours.  Now that I no longer work for VMware and am living half a world away, I have to request the binaries myself from my contacts at VMware.  So, it will take me a few days to honor requests.</p>
<h2>Update, July 2010: ESXi in vSphere 4.1 Comes with vscsiStats</h2>
<p>vscsiStats is now distributed with ESXi.  There will be no vscsiStats for ESXi 4.1 and newer placed on this page.  This page will remain up for older versions of ESXi, however.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/10/21/vscsistats-for-esxi/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Micro-bursting and Storage Performance</title>
		<link>http://vpivot.com/2009/09/23/micro-bursting-and-storage-performance/</link>
		<comments>http://vpivot.com/2009/09/23/micro-bursting-and-storage-performance/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 22:49:55 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vmkernel]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=47</guid>
		<description><![CDATA[I have been reading Chad Sakac&#8217;s article on IO queues and micro-bursting for months now.  Chad is wicked technical for a manager type and after reading this post a dozen times I think I finally have it internalized.   Let me put my own spin on this tome, embedded in which are several jewels of [...]]]></description>
			<content:encoded><![CDATA[<p>I have been reading <a href="http://virtualgeek.typepad.com/virtual_geek/2009/06/vmware-io-queues-micro-bursting-and-multipathing.html">Chad Sakac&#8217;s article on IO queues and micro-bursting</a> for months now.  Chad is wicked technical for a manager type and after reading this post a dozen times I think I finally have it internalized.   Let me put my own spin on this tome, embedded in which are several jewels of wisdom.</p>
<p><span id="more-47"></span>The article describes a phenomena common to consolidated workloads called micro-bursting.  Micro-bursting occurs in such short periods as to go unnoticed in the sampling window of monitoring tools.  As Chad put it:</p>
<blockquote><p>Remember that every metric has a timescale.   IOps is in seconds.   Disk service time is in ms (5-20ms for traditional disk, about 1ms for EFD).  If an I/O is served from cache, it’s in microseconds.   Switch latencies are in microseconds.    Here, the I/O periods were so short that they filled up the ESX LUN queues instantly, causing a “back-off” effect for the guest.   These were happily serviced by the SAN and the storage array, which had no idea anything bad was going on.</p></blockquote>
<p>When these bursts happen queues overflow, messages backup, and service times briefly sky rocket.  These rapid overflows happen in a fraction of <a href="http://communities.vmware.com/docs/DOC-9279">esxtop</a>&#8216;s multi-second window and <a href="http://communities.vmware.com/docs/DOC-5600">vCenter</a>&#8216;s 20 second window.</p>
<p>So, what buffers are we talking about?  Take a look at Chad&#8217;s hand-drawn picture of the storage path, which is only slightly less complicated than <a href="http://www.advanceusa.org/blog/content/binary/Obamacare%20Diagram.jpg">the Republican view of Obamacare</a>:</p>
<div class="wp-caption alignnone" style="width: 650px"><a href="http://virtualgeek.typepad.com/virtual_geek/2009/06/vmware-io-queues-micro-bursting-and-multipathing.html"><img title="Queues in the Storage Path" src="http://virtualgeek.typepad.com/.a/6a00e552e53bd2883301157135b4ae970b-pi" alt="Chad Sakacs image showing the numerous locations of storage queues in all locations from the VM to the platter." width="640" height="480" /></a><p class="wp-caption-text">Chad Sakac&#39;s image showing the numerous locations of storage queues in all locations from the VM to the platter.</p></div>
<p>If you are at VI admin, you care about the LUN queue in ESX.  ESX creates one of these queues for each HBA+LUN pair.  So, multipathing to a LUN increases the effective LUN queue and using a single HBA to multiple LUNs will guarantee a queue to each LUN.  Instances of this queue will overflow if many VMs on a single server issue commands to a single LUN.  As Chad says:</p>
<blockquote><p>In VMware land – this is usually the fact that the default LUN queue (and corresponding Disk.SchedNumReqOutstanding value) are 32 – which for most use cases is just fine, but when you have a datastore with many small VMs sitting on a single LUN, the possibility of microbursting patterns becomes more likely.</p></blockquote>
<p>So, when will the queues overflow?  Not often:</p>
<blockquote><p>In the example [Vaughn] used, [multi-pathing] would not help materially if there were more than 3 ESX hosts, as it would be a likely case of “underconfigured array” – not host-side queuing.</p></blockquote>
<p>The message here is that there is only a small window of configurations will result in LUN queue overflow: many VMs on very few hosts talking to a common LUN.  This is a perfect use case for <a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a>, which I have talked about in various forums now.  vscsiStats avoid sampling windows by recording precise information on every IO.  This means that microburst statistics will not be averaged&#8211;and lost&#8211;across a time period.</p>
<p>Consider the following data I pulled from a sample session on my office system:</p>
<table border="0">
<tbody>
<tr>
<td><strong>Frequency</strong></td>
<td><strong>Histogram Bucket Limit</strong></td>
</tr>
<tr>
<td>2</td>
<td>1</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>50</td>
<td>4</td>
</tr>
<tr>
<td>879</td>
<td>6</td>
</tr>
<tr>
<td>6588</td>
<td>8</td>
</tr>
<tr>
<td>82830</td>
<td>12</td>
</tr>
<tr>
<td>161362</td>
<td>16</td>
</tr>
<tr>
<td>79802</td>
<td>20</td>
</tr>
<tr>
<td>18080</td>
<td>24</td>
</tr>
<tr>
<td>5377</td>
<td>28</td>
</tr>
<tr>
<td>1997</td>
<td>32</td>
</tr>
<tr>
<td>433</td>
<td>64</td>
</tr>
<tr>
<td>0</td>
<td>64</td>
</tr>
</tbody>
</table>
<p>This table shows the number of outstanding IOs as each new IO arrives in the VMkernel.  The first row means that during the collection period only two IOs arrived to a queue with one outstanding IO.  Row two says that two IOs entered when there was were two outstanding IOs.  The third row states that 50 IOs arrived while the queue had 3-4 IOs.  And so on.</p>
<p>This table represents a fairly healthy access pattern, showing that only 433 out of 357,402 IOs arrived while the queue had 33-64 outstanding IOs (shown on the last row).  With ESX&#8217;s default LUN queue depth at 32, vscsiStats shows that a very small number of IOs arrived to an overflowing queue.</p>
<p>In summary, some storage performance issues appear and disappear so rapidly as to not be visible with sampling based tools, even as fine-grained as esxtop.  As a VI admin you should consider this in your most challenging troubleshooting cases.  And remember to use vscsiStats if all else has failed.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/09/23/micro-bursting-and-storage-performance/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Performance Troubleshooting: No PhD Required!</title>
		<link>http://vpivot.com/2009/09/18/performance-troubleshooting-no-phd-required/</link>
		<comments>http://vpivot.com/2009/09/18/performance-troubleshooting-no-phd-required/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 18:42:22 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[tier-1]]></category>
		<category><![CDATA[vcenter]]></category>
		<category><![CDATA[vmworld]]></category>
		<category><![CDATA[vscsistats]]></category>
		<category><![CDATA[vsphere]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=41</guid>
		<description><![CDATA[A couple of weeks ago at VMworld in San Francisco I squeezed a few press meetings in between the 19 sessions of the performance lab I led. In one of those meetings I talked with David Vellante and two of his colleagues to discuss vSphere performance and performance monitoring.  David and company asked some hard [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago at VMworld in San Francisco I squeezed a few press meetings in between the 19 sessions of the performance lab I led.  In one of those meetings I talked with <a href="http://www.internetevolution.com/profile.asp?piddl_userid=13982">David Vellante</a> and two of his colleagues to discuss vSphere performance and performance monitoring.  David and company asked some hard questions about our performance work but my knowledge of this area runs deep, so the conversation was fruitful and interesting.</p>
<p>A few days after the conference a coworker of mine shared the following quote with me, courtesy of <a href="http://www.internetevolution.com/author.asp?section_id=654&amp;doc_id=181395">an article by David on Internet Evolution</a>:</p>
<blockquote><p>The fact is, most data center managers wouldn’t trust VMware to manage their Tier 1 applications because if something goes wrong performance-wise, you still need to roll in the VMware PhDs to solve it.</p></blockquote>
<p>Let me respond to a few of the suggestions from this quote.</p>
<h2><span id="more-41"></span>&#8220;Customers Do Not Trust VMware for Tier-1 Apps&#8221;</h2>
<p>The following chart presents data collected from 676 VMware customers in July and August of 2008.</p>
<div id="attachment_58" class="wp-caption alignnone" style="width: 619px"><img class="size-full wp-image-58" title="Application Virtualization Rates (2008)" src="http://vpivot.com/wp-content/uploads/2009/09/picture-11.png" alt="Percentage of application instances virtualized by VMware customers." width="609" height="307" /><p class="wp-caption-text">Percentage of application instances virtualized by VMware customers.</p></div>
<p>This graph shows the large rates of virtualization of the most well-known enterprise applications.  By any definition of &#8220;Tier-1 Application&#8221;, at least one tier-1 application is mostly virtualized by this customer sample.  And the survey date bears repeating: summer, 2008.  Virtualization acceptance has greatly increased in the past 12 months.</p>
<p>Concerns about performance management aside, VMware customers <em>are</em> virtualizing their tier-1 apps <em>today</em>.  So let&#8217;s talk about the process of performance troubleshooting.</p>
<h2>&#8220;A PhD From VMware Is Required to Fix Performance Problems&#8221;</h2>
<p>I think that David must have inferred from my confident and detailed talk on a great number of performance-related technical topics that I am the cream of the crop of America&#8217;s educational system.  For the record, I went to a state school in Alabama and spent far more time drinking beer than going to class.  Nonetheless, I am sure what he meant to say was&#8230;</p>
<h2>&#8220;A Highly-skilled Performance Expert Is Required to Fix Performance Problems&#8221;</h2>
<p>VMware now boasts over 150,000 customers, and I only interact with a relative handful a year.  If I count the experts in our small performance community I can conclude that our performance experts touch a very small percentage of our customer base each year.  That means that the great majority of our customers are solving their performance problems without engaging us.</p>
<p>Customers are fixing their problems using a variety of tools that I continue to document:</p>
<ul>
<li>The vSphere client interface to vCenter is known to everyone and <a href="http://communities.vmware.com/docs/DOC-5600">its counters</a> operate with 20s granularity and are effective at fixing about 90% of most performance problems.</li>
<li>esxtop, with its finer granularity and <a href="http://communities.vmware.com/docs/DOC-9279">larger counter list</a>, can be used to fix 95% of problems, most of which could have been fixed with vCenter statistics.</li>
<li><a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a> is extraordinarily useful for a small percentage of problems, perhaps 10-20% of those I see.</li>
</ul>
<p>We are currently working on collecting all of these views into the client and adding a framework, <a href="http://communities.vmware.com/community/developer/forums/vprobes?view=overview">vProbes</a>, that will enable unprecedented visibility into operating systems and applications.  But even as things stand today, we have provided documentation and tools that all of our customers can use to fix any problem.  There is always room for improvement, but no PhD is required.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/09/18/performance-troubleshooting-no-phd-required/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>vSphere Is Not the Performance Problem, Your Storage Is</title>
		<link>http://vpivot.com/2009/09/18/storage-is-the-problem/</link>
		<comments>http://vpivot.com/2009/09/18/storage-is-the-problem/#comments</comments>
		<pubDate>Fri, 18 Sep 2009 00:00:43 +0000</pubDate>
		<dc:creator>drummonds</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[esxtop]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[vscsistats]]></category>

		<guid isPermaLink="false">http://vpivot.com/?p=12</guid>
		<description><![CDATA[[This is an update to one of my favorite articles, which details my on-site investigation of SQL Server performance problems.] Back in July I had the privilege of riding along with VMware&#8217;s Professional Services Organization as they piloted a possible performance offering. We are considering two possible services: one for performance troubleshooting and another for [...]]]></description>
			<content:encoded><![CDATA[<p><em>[This is an update to <a href="http://communities.vmware.com/blogs/drummonds/2009/08/17/first-success-of-vmwares-performance-service-offering">one of my favorite articles</a>, which details my on-site investigation of SQL Server performance problems.]</em></p>
<p>Back in July I had the privilege of riding along with VMware&#8217;s Professional Services Organization as they piloted a possible performance offering.  We are considering two possible services: one for performance troubleshooting and another for infrastructure optimization.  During this trip we piloted the troubleshooting service, focusing on the customer&#8217;s disappointing experience with SQL Server&#8217;s performance on vSphere.</p>
<p><span id="more-12"></span>If you have read my blog entries (<a class="jive-link-blogpost" href="http://communities.vmware.com/blogs/drummonds/2009/03/13/sql-server-performance-problems-not-due-to-vmware">SQL Server Performance Problems Not Due to VMware</a>) or <a class="jive-link-external" href="http://www.vmware.com/a/webcasts/details/265">heard me speak</a>, you know that SQL performance is a major focus of my work.  SQL Server is the most common source of performance discontent among our customers, yet 100% of the problems I have diagnosed were not due to vSphere.  When this customer described the problem, I knew this SQL Server issue was stereotypical of my many engagements:</p>
<blockquote><p>&#8220;We virtualized our environment nearly a year ago and and quickly determined that virtualization was not right for our SQL Servers.  Performance dropped by 75% and we know this is VMware&#8217;s fault because we virtualized on much newer hardware on the exact same SAN.  We have since moved the SQL instance back to native.&#8221;</p></blockquote>
<p>Most professionals in the industry stop here, incorrectly files this problem as a deficiency of virtualization, and move on with their deployments.  But I know that <a class="jive-link-external" href="http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf">vSphere&#8217;s abilities with SQL Server</a> are phenomenal, so I expect to make every user happy with their virtual SQL deployment. I start by challenging the assumptions and trust nothing that I have not seen for myself.  Here are my first steps on the hunt for the source of the problem:</p>
<ol>
<li>Instrument the SQL instance that has been moved back to native to profile its resource utilization.  Do this by running Perfmon to collect stats on the database&#8217;s memory, CPU, and disk usage.</li>
<li>Audit the infrastructure and document the SAN configuration.  Primarily I will need RAID group and LUN configuration and an itemized list of VMDKs on each VMFS volume.</li>
<li>Use esxtop and vscsiStats to measure resource utilization of important VMs under peak production load.</li>
</ol>
<p>There are about a dozen other things that I could do here, but my experience in these issues is that I can find 90% of all performance problems with just these three steps.  Let me start by showing you the two RAID groups that were most important to the environment.  I have greatly simplified the process of estimating these groups&#8217; performance, but the rough estimate will serve for this example:</p>
<table id="newspaper-a">
<tbody>
<tr>
<th>RAID Group</th>
<th>Configuration</th>
<th>Performance Estimate</th>
</tr>
<tr>
<td>A</td>
<td>RAID5 using 4 15K disks</td>
<td>4 x 200 = 800 IOPS</td>
</tr>
<tr>
<td>B</td>
<td>RAID5 using 7 10K disks</td>
<td>7 x 150 = 1050 IOPS</td>
</tr>
</tbody>
</table>
<p>We found two SQL instances in their environment that were generating significant IO: one that had been moved back to native and one that remained in a virtual machine.  By using Perfmon for the native instance and vscsiStats the virtual one, we documented the following demands during a one-hour window:</p>
<table id="newspaper-a">
<tbody>
<tr>
<th>SQL Instance</th>
<th>Peak IOPS</th>
<th>Average IOPS</th>
</tr>
<tr>
<td>X (physical)</td>
<td>1800</td>
<td>850</td>
</tr>
<tr>
<td>Y (virtual)</td>
<td>1000</td>
<td>400</td>
</tr>
</tbody>
</table>
<p>In the customer&#8217;s first implementation of the virtual infrastructure, both SQL Servers, X and Y, were placed on RAID group A.  But in the native configuration SQL Server X was placed on RAID group B.  This meant that the storage bandwidth of the physical configuration was approximately 1850 IOPS.  In the virtual configuration the two databases shared a single 800 IOPS RAID volume.</p>
<p>It does not take a rocket scientist to realize that users are going to complain when a critical SQL Server instances goes from 1050 IOPS to 400.  And this was not news to the VI admin on-site, either.  What we found as we investigated further was that virtual disks requested by the application owners were used in unexpected and undocumented ways and frequently demanded more throughput than originally estimated.  In fact, through <a href="http://communities.vmware.com/docs/DOC-10095">vscsiStats</a> analysis, my contact and I were able to identify an &#8220;unused&#8221; VMDK with moderate sequential IO that we immediately recognized as log traffic.  Inspection of the application&#8217;s configuration confirmed this.</p>
<p>Despite the explosion of VMware into the data center we remain the new kid on the block.  As soon as performance suffers the first reaction is to blame the new kid.   But next time you see a performance problem in your production environment, I urge you to look at the issue as a consolidation challenge, and not a virtualization problem.  Follow the best practices you have been using for years and you can correct this problem without needing to call me and my colleagues to town.</p>
<p>Of course, if you want to fly us out for to help you correct a specific problem or optimize your design, I promise we will make it worth your while.</p>
]]></content:encoded>
			<wfw:commentRss>http://vpivot.com/2009/09/18/storage-is-the-problem/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

