Are SSDs really too fast for our current SQL Server operations?

Maybe. If you are using SQL Server and are slowly transitioning to SSD storage (local or SAN), you are quickly finding out that a lot of bottlenecks associated with your storage might no longer be an issue. When you start measuring your latency in microseconds vs milliseconds, the diskIO waits slowly disappear from your top 10 list of worries.

Interestingly enough, I just ran into an issue where indeed SSD storage was TOO FAST for some of the SQL Server operations.
Ok, I have to admit, it was not entirely SQL Server’s fault, but still an issue that had to be addressed.

Issue: During SQL Server startup, the SPN (service principle name) failed to register with a Domain Controller

Error:

The SQL Server Network Interface library could not register the Service Principal Name (SPN) for the SQL Server service. Error: 0x2098, state: 15. Failure to register an SPN may cause integrated authentication to fall back to NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies.

This message is a very common error message when you are running the SQL Server Engine as a domain account with limited privileges.

If you fail to assign the necessary permissions (write Service Principle Name) in AD, you will get this error every time your SQL Server starts up. In our case, this was not the issue.

Troubleshooting steps:

  • Ensured that the service account running SQL Server Engine had all necessary permissions in AD
  • Ensured that after a manual restart of the SQL Server service, the SPN would register without any issue – SPN registered
  • Set the SQL Engine service to manual and started it right after windows restart – SPN registered without an issue
  • Ensured that the issue was not related to a domain controller (checking all the logs on the domain controller and ensuring that the initial communication from the SQL Server was not getting to it)
  • Tried the same setup on a VM (SSD SAN storage) and physical (SSD Local storage) – same behavior – issue existed in both environments
  • Moved the VM to a different geographical location (still on SSD SAN storage) – same behavior
  • Moved the VM to a slower spinning SAN LUN – Issue disappeared !!!
  • Moved the VM back to a SSD SAN LUN – Issue came back !!!
  • Went through all the logs and also found that a GPO service had an issue connecting to a Domain Controller during windows startup

In short, the issue is simple. The services (GPO, SQL Server, Time service) come up so fast, that when they are ready to communicate to the domain controller, the network stack is not fully operational yet. I have not fully measured the exact time delay I would need in order to address this issue but after all the testing was completed I can say that we are really talking about a second or two (maybe even less).

Resolution Options:

  • Fixed the GPO service issue by placing a registry entry to delay the timeout value of the GPO call
    https://support.microsoft.com/en-us/kb/2421599
  • Came up with 3 options to address the SQL Server startup issue
  • Created a startup PowerShell script for SQL Server Services + set SQL Services to manual start
  • OPTION 1a: Placed the script in the Local Security Policy (under startup scripts) – tested (ALL GOOD)
  • OPTION 1b: Created a GPO (same as above) on the SQL Server OU in AD where all my SQL Server computer objects reside – tested (ALL GOOD)
  • OPTION 2: Created a windows service that literally didn’t do anything, set it to delayed startup, and made SQL Server Engine service depended on that service – tested (ALL GOOD)

There were two more things we have tested without luck.

Modifying a default computer GPO policy for all the resources to wait until the network stack is up and running
Trying to set the SQL Service to delayed start (even though it does change in the services gui, the true delay does not occur)
We have tested the 3 options on both VM (SAN SSDs) and physical (Local SSDs), using different service accounts, and different geographical locations without encountering any problems. We have also decided to go with OPTION 1b to centralize the management of the fix and have a uniform solution to the problem. We have also found that SQL Server will start up with a minimal delay (couple of seconds) with OPTION 1, and an over 60sec delay (service delayed startup) for OPTION 2.

9 thoughts on “Are SSDs really too fast for our current SQL Server operations?”

  1. It’s actually a cool and helpful piece of information. I am glad that you simply
    shared this useful info with us. Please keep us informed like this.
    Thank you for sharing.

  2. Hi! I just want to give you a big thumbs up for your great info you have here on this post. I am returning to your blog for more soon. Susana Ian Runkel

  3. Thanku for such relevant informations through your blog. it made me understand the things that i was not aware of. Auguste Shurwood Carling

  4. Pretty nice post. I simply stumbled upon your blog and wanted to say that I’ve truly loved surfing around your weblog
    posts. After all I will be subscribing for your feed and I am hoping
    you write again soon!

Comments are closed.