Don’t Forget RoboCopy

Published On: 2019-05-03By:

On occasion you have to find means to get around an issue.  Recently at a client I discovered that an unknown SQL Server that wasn’t being properly cared for.  This particular instance wasn’t taking backups and it was deemed as a critical production server.  You can begin to see the problems that were arising.

However, there is a twist to the story.  Due to changes in active directory, I couldn’t get the appropriate permissions configured for the SQL Server service to allow it to backup to the network destination.  I also could not take an outage anytime soon to change the service account to something better suited.  It’s a long and convoluted story for another post.  The immediate need was to 1) get backups of all of the databases locally and 2) find a way to move them off of the local server.  I also needed to do this on a schedule so that I could meet the RPO/RTO guidelines the client had set forth.

My solution?  RoboCopy.

Robocopy has been around for years within the Microsoft eco-system and it is highly versatile.  However, until recently, it wasn’t a tool that I was versed at and frankly, hardly used it.  Over the past year or so, however, I have found myself using it more and more as a solution to for file movement when needed.

Essentially, robocopy will move files or even directories from one location to another.  It will copy permissions without any additional coding, and it will only copy the files that do not exist within the destination.  This is useful in that you do not have to explicitly account for files that might already exist at the destination.  It handles it for you.  Some other benefits of robocopy:

  • Ability to restart in the event it fails, both reading and writing files
  • Handles subdirectories
  • Copy file attributes
  • Can mirror directories
  • Move files and then delete the source once they are copied
  • many others…

You can see all of the options from the documentation from Microsoft.

In this case, I’m interested in using the following switches:

  • /s – process all of the subdirectories from the specified root folder
  • /z – copy files in restartable mode.  If it dies in the middle, it can pick up where it left off
  • /eta – shows the estimated time of arrival of the files
  • /r:3 – retry the reads for 3 before dying. The default is 1 million tries.
  • /w:3 – number of seconds to wait before retrying. The default is 30 seconds

RoboCopy In Action

In this case, using RoboCopy is pretty straight forward.  I have Ola Hallengren’s Maintenance Solution running on the SQL Server instance, so I configured it to backup to a specific directory on the local server.   I also make note of the available drive space.  On this particular instance, the databases are relatively small.  3 days of backups stored locally is less than 5GB so not heavy disk usage.  If you have larger databases, you might need to adjust how many backups you keep locally so that you do not consume the entire drive.  Adjust accordingly.

The script itself is pretty simple and follows this format with the previously listed switches:

robocopy [source] [destination] /s /z /eta /r:3 /w:3

or

robocopy c:\temp\servername \\naservername\sharename$ /s /z /eta /r:3 /w:3

SQL Agent Job

We currently do transaction log backups of production databases every 15 minutes.  Since I am backing up the database as well as transaction logs to the local disk, I needed to be able to push them up to the network storage on a regular basis.  You can use Windows Task Scheduler to facilitate this, however since I’m more versed in the SQL Server side of things, I decided to put this into an agent job.  This gave me a lot of flexibility of scheduling and given that the files should be small, I can schedule the job to run more frequently.  Running the job more frequently allows me to exceed the RPO/RTO requirements without a lot of extra overhead.

To set this up, first, we have to put the command into a batch file in which we can call from the agent job.  In the batch file, we have to adjust the error levels so that robocopy will suppress a successful exit status and only report on actually errors. We do this because robocopy will report a successful exit as a non-zero ERRORLEVEL.  SQL Server agent views as any non-zero exit a failure and will subsequently fail the step.  I borrowed the below code from Rob Volk (T) and one of his blog posts.

rem suppress successful robocopy exit statuses, only report genuine errors (bitmask 16 and 8 settings)
set/A errlev="%ERRORLEVEL% & 24"

Next, we have to adjust for an appropriate exit so that the agent job can success or fail correctly.

rem exit batch file with errorlevel so SQL job can succeed or fail appropriately
exit/B %errlev%

The entire batch file now looks like this:

rem suppress successful robocopy exit statuses, only report genuine errors (bitmask 16 and 8 settings) 
set/A errlev="%ERRORLEVEL% & 24" 

rem exit batch file with errorlevel so SQL job can succeed or fail appropriately 
exit/B %errlev%

robocopy c:\temp\servername \\naservername\sharename$ /s /z /eta /r:3 /w:3

Once the batch file was created, I could create a standard agent job with a step that would execute the batch file.  It looks like this:

You can see that the type has been set to “Operating System”.  The “/c” switch tells cmd.exe to execute the string supplied, which in this case is the batch file that I had created.

Since the client has a RTO/RPO of 15 minutes, I scheduled the job to run every 5 minutes.  This helps ensure that any recent backups are pushed to a secondary off-site storage within the 15 minute limitation in the event of a hardware failure.

Summary

The real solution to this issue would to adjust permissions properly for the SQL Server service or change the service account altogether.  This would allow the SQL Serve to backup directly to the network destination without needing a mitigating step in between.  However, there are times where that might not be possible, and you have to find another solution.  In this case, RoboCopy was a great middle man solution that fits all of my requirements.   Next time you need to move files around, remember that RoboCopy might just fit the bill.

Enjoy!

© 2019, John Morehouse. All rights reserved.

Contact the Author | Contact DCAC

Gathering VMWare Metrics

Published On: 2019-04-26By:

Anytime that you a large environment, gathering metrics in any form can be daunting and cumbersome.  Using scripting languages can usually greatly improve the efficiency of this process as well as be more accurate.  Whether it’s VMWare, SQL Server, or some other platform, it would behoove you to see what scripting languages it may support.

In a recent case, I had to gather some metrics from a VMWare implementation.  The environment had multiple hosts and each host had multiple virtual machines within them. I needed to gather metadata about each virtual machine on certain hosts.  I was looking data like:

  • Number of CPUs
  • How much memory is allocated
  • What is the operating system version
  • What virtual host is the VM on

Thankfully VMWare has its own flavor of Powershell, called PowerCLI, which can be used to gather all of the metrics easily and quickly.  You can find the documentation on PowerCLI here.  When looking into this scripting language keep in mind that different versions of VMWare will have different dialects for PowerCLI.  If you find that a particular cmdlet isn’t working, check the documentation to see if it is available for your particular version of ESX.

The Script

To start with, I’ll need credentials in order to connect to the vcenter server. From that vantage point, I can then jump into the pool of VMware resources.  Security is an utmost concern for me always, so getting the appropriate credentials securely is the way to go.  Thank fully, we can use the get-credential cmdlet which will secure my user name and password into a credential object.

#get credentials safely
$pass = get-credential

Getting the credentials in this manner allows me to not have to hard code anything into the script, namely my password.  This cmdlet will prompt you in a separate dialog window to enter your user id as well as your password.  The credential object can then be passed into the connect-viserver cmdlet by using the -credential switch.

#connet to the vcenter server
connect-viserver -server server1.domain.com -credential $pass

In the above cmdlet, we also pass in the -server value which tells the cmdlet which server I want to initiate a connection to.  Next, we can get a list of all virtual machines that reside on a given host, namely host1 and host2.  The host name isn’t a string by default so in order to make a comparison, I had to use the ToString() method.  Note that there might be a better alternative to this, but I needed the script quickly, so I didn’t waste a lot of time researching options.

#get all of the vm's that we are interested in which in this case reside on a particular host(s).
$vms = get-vm | ? {$_.VMHost.tostring() -eq "host1.domain.com" -or $_.VMHost.tostring() -eq "host2.domain.com"}
The $vms variable will be populated with a list of all of the virtual machines.  Now I can use a foreach loop and iterate through them and get their properties. I also wanted the ability to export the data into Excel where I could make it look prettier. In order to do this, I had to install the ImportExcel module.
#Note: the ImportExcel module is needed: 
install-module -name ImportExcel
foreach ($vm in $vms){
#get VM properties
$vm | select * | Export-excel -worksheetname "Virtual Machines" -path c:\temp\host1.xlsx -Append

The other set of information that I wanted was to make sure that the virtual machines were using the appropriate SCSI controllers.  For non-operating system drives, they should be using the paravirtual drivers.  We can use PowerCLI to get that information as well.  This will finish out the ForEach loop.
#get SCSI Controller Information
$vm | Get-scsicontroller | export-excel -worksheetname "SCSI Controllers" -path c:\temp\host1.xlsx -Append
}

Notice that in both export statements I included the Append switch.  If you do not include that, the process will over-write the entire file and the only thing that it would contain is the last virtual machine that the script touched.  This doesn’t do any good so we must append to the file.

You can see the entire script in the DCAC Github Repository.

Summary

This was a quick PowerCLI script that I put together to fetch as many metrics about each virtual machine from the ESX hosts.  It’s quick and short script and it can certainly be improved upon, such as adding in variables to make it cleaner.  Make sure if you make improvements, do a pull request on GitHub so we can incorporate your improvements!

© 2019, John Morehouse. All rights reserved.

Contact the Author | Contact DCAC

Function Fn_PageResCracker Internals

Published On: 2019-04-05By:

In a previous blog post, I discussed two new methods to determine exactly which page a request might be waiting for when there is contention.  One of these new methods involves a new function, fn_pagerescracker.   Naturally, I wanted to see how this function operates.  Let’s look at the Master database to investigate how it works!

The function consumes a BINARY(8) parameter and returns a table.   This parameter represents a hexadecimal value that is constructed from the database ID, page ID, and file ID.

-------------------------------------------------------------------------------
-- Name: sys.fn_PageResCracker
--
-- Description:
--	Cracks the output of wait_resource page info
--
-- Notes:
-------------------------------------------------------------------------------
create function [sys].[fn_PageResCracker] (@physical_locator binary (8))
returns @dumploc_table table
(
	[db_id]	int not null,
	[file_id]	int not null,
	[page_id]	int not null
)
as
begin

	if @physical_locator is not null
	begin
		declare @db_id	binary (2)
		declare @file_id	binary (2)
		declare @page_id	binary (4)

		-- Page ID is the first four bytes, then 2 bytes of file ID, then 2 bytes of slot
		--
		select @page_id = convert (binary (4), reverse (substring (@physical_locator, 1, 4)))
		select @file_id = convert (binary (2), reverse (substring (@physical_locator, 5, 2)))
		select @db_id = convert (binary (2), reverse (substring (@physical_locator, 7, 2)))

		insert into @dumploc_table values (@db_id, @file_id, @page_id)
	end
	return
end
GO

You can see from the code that the parameter is then “diced” up with SUBSTRING commands and then converted into a small binary value.  Also, using SUBSTRING against a binary value (which is the parameter that is passed in) will return the number of bytes specified rather than the number of characters.  In this case, the Page ID is the first 4 bytes, the file ID is the following 2 bytes and finally the last 2 bytes represents the database ID.  Furthermore, when they are inserted into the table (which is returned to the calling query) it is then converted into an integer value.

Shown above, the page_resource value is 0x5001000001000600.  Now that we know how the function is going to work, we can do the math ourselves and confirm things.  Keep in mind that SQL Server utilizes big endians, so we will have to do some byte reversing.

Byte Breakdown
Page ID File ID Database ID
0x50010000 0x0100 0x0600

Page ID = 0x50010000, byte reversed is 0x00000150 (or 0x150), which when converted to an integer = 336

File ID = 0x0100, again byte reversed is 0x0001 (or 0x1), which when converted to an integer is 1.

Database ID = 0x0600, reversed once again, 0x0006 (or 0x6), which converted to an integer is 6.

We can see below that we have now confirmed that the function is working as expected!

Summary

One of things I enjoy the most is digging into the internals on how SQL Server works.  It continues to teach me new and exciting things.  It also helps to make me a better DBA in my opinion.  In this case, the value add isn’t super steep but it’s still fun tearing apart the logic.

Enjoy!

 

 

© 2019, John Morehouse. All rights reserved.

Contact the Author | Contact DCAC

New Page Functions in SQL Server 2019

Published On: 2019-03-29By:

I will be the first to admit that I am what one might call a “SQL Head”.  In other words, I’m a nerd that likes to dig into the internal workings of SQL Server, sometimes down to the bits and bytes. The tools to facilitate this archaeology have slowly been evolving and with the release of SQL Server 2019 later this year (2nd half of 2019), this continues to hold true.  Two new tools, a dynamic management function (DMF) and the other a function, will be making an appearance, namely sys.dm_db_page_info and sys.fn_pagerescracker respectively.

Note: For this post I am using Azure Data Studio to get more familiar with the product.  

sys.dm_db_page_info

This new function will return the page header information of a specified page in a single row.  This information will contain vital information such as object_id, index_id, and partition_id, along with many others.  In the past, the only real tool available to retrieve this information was to utilize DBCC PAGE.  With the use of trace flag 3604, DBCC PAGE would output the contents of a page into the results pane of SSMS.   By having this information now exposed through a DMF, we can easily and quickly get information about a given page if needed.  We can even see that Azure Data Studio (shown below) already has intellisense available for the DMF.

In this example, I am connected to a SQL Server 2019 instance that is residing on Linux.

If I were to look at page ID 336 in the PagesOnLinux database on this instance, we can see information from the page header, such as the status of the differential map. We can see below that the status is set to “CHANGED” as well as the DIFF_MAP page ID is 6.

One thing to note about this particular is that all of the parameters except for the MODE (the last one) has NULL as the default value, however, if you attempt to pass a NULL value into the function when executing it will result in an error.  I find this to be confusing as other DMF’s allow you to pass in NULL values to return a larger sub-set of data.  It’s possible that functionality will change by the time SQL Server 2019 is released since the product is still under development.

sys.fn_pagerescracker

Another new addition to SQL Server 2019 is the function sys.fn_pagerescracker.   This function will consume a page resource binary value from sys.dm_exec_requests.  This binary value is a representation of the database ID, file ID, and page ID.  The function will convert it into these values which can then be cross applied to the sys.dm_db_page_info dynamic management function.  Note that the page resource value from sys.dm_exec_request will only be not NULL in the event that the WAIT_RESOURCE column has a valid value.  If the column has a NULL value, this indicates that there is not any type of wait resource, so the row can’t be used for this purpose.  The intent of this is to help diagnose active requests that are waiting on some type of page resource.  A good example of this would be page latches.

Watch for a future blog from me on how this function works internally.  Coming soon!

In Action

In our database, let’s create a table, dbo.Table1 and insert a single row into Table1.

CREATE TABLE dbo.table1 (id int, fname varchar(20))
GO
INSERT dbo.table1 (id, fname) select 1,'John'
GO

Now, let’s start a transaction and update the data in Table1.   Notice that there are query hints in play to force a particular behavior, namely we want to start a transaction that will remain open and lock the page the data is on.  This method is NOT recommended for production usage.   Subsequently, in another session, we will start another transaction and try to see the data that is behind the first transaction.

BEGIN TRANSACTION
UPDATE dbo.Table1
SET fname = 'John'  --I'm just setting it back to the same value; it doesn't matter what I'm setting it to
WHERE id = 1
GO

and our secondary query

-- Note the Pagelock: not recommends for production usage
SELECT * FROM dbo.Table1 WITH (PAGLOCK)

Now that we have a transaction that is blocking, we examine sys.dm_exec_request to obtain the page_resource value that our second query is waiting on.  Remember that the page_resource column will only have a non-NULL value if the request is waiting for another resource.

Now we can finish up and look at the entire query which would allow us to quickly identify what a request is waiting on.  Borrowed from Books Online:

SELECT page_info.* 
FROM sys.dm_exec_requests AS d 
CROSS APPLY sys.fn_PageResCracker (d.page_resource) AS r 
CROSS APPLY sys.dm_db_page_info(r.db_id, r.file_id, r.page_id, 1) AS page_info

Here we can see that the new function gave us the database id, file id, and page id of the corresponding page that we are waiting on.

Summary

Microsoft continues to make advancements in a number of areas with SQL Server 2019.  These two new tools will make analysis of various scenarios easier to decipher and resolve.  Make sure to add these two new tools to your tool belt so that you can have them ready at your disposal.

© 2019, John Morehouse. All rights reserved.

Contact the Author | Contact DCAC
1 2 3 16

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.

Awards & Certifications

Microsoft Partner       Insights Sccess Award    Technology Headlines Award    Golden Bridge Gold Award    CIO Review Top 20 Azure Solutions Providers    VMWare Partner
Microsoft Certified Master    Microsoft MVP