Would You Fly a Plane with One Engine? Or Run Your Airline with One Data Center(re)?

Published On: 2017-05-30By:

For those of you who may of been in the US or outside of Europe this past weekend, you may not have heard about the major British Airways IT outage, that took down their entire operations for most of Saturday and into Sunday. Rumors, which were later confirmed, were that a switch from primary to backup power at their primary data centre (they’re a UK company, so I’ll spell it in the Queen’s English), lead to a complete operations failure. I have a bit of inside information, since my darling wife was stuck inside of Terminal 5 at Heathrow.

Image result for jet with missing engine

There’s a requirement for planes that travel across oceans call ETOPs, which stands for Extended Range Operation with Two-Engine Airplanes, however in parlance is know as Engines Turn or Passengers Swim. This protocol and requirements are a set of rules that ensure if a plane has a problem over a body of water, it can make it back to shore for a safe landing. As someone who flies across oceans a decent amount, I am very happy the regulatory bodies have these rules in place.

However, there are no such rules for data centers that run airline operations. In fact, in January, Delta Airlines had a major failure which took down most of its operations for a couple of days. Most IT experts have surmised that Delta was running a single data center for it’s operations. Based on the evidence from Saturday’s incident with BA, I have to assume that they are, as well. One key bit of evidence, was that BA employees were unable to access email. They are an Office 365 customer, so theoretically, even if on-premises systems were down e-mail should work. However, if they were using Active Directory Federation Services, so that all of their passwords were stored on-prem, then the data center being down, would mean they couldn’t authenticate, and therefore would not have email.

This was my biggest clue that BA was running with a single data center—was that email didn’t work. While some systems, particularly some of the mainframe systems that may handle flight operations, have a tendency to not do well with failover across sites, Active Directory is one of the best distributed systems there is, and is extremely resilient to failures. In fact, given BA’s global business, I’m really surprised they didn’t have ADFS servers in locations around the world.

Enter the Cloud

Denny and I sat talking yesterday and running some numbers on what we thought a second data center would cost a company like BA. Our rough estimate (and this is very rough) was around $30-40 million USD. While that is a ton of money, it is estimated that weekend’s mess may cost BA up to  £150 million (~$192MM USD). However, companies no longer have to build multiple data centers in order to have redundancy, as Microsoft (and Amazon, and Google) have data centers throughout the world. The cloud gives you the flexibility to protect critical systems, and at a much cheaper cost. I’ve designed DR strategies for small firms that cost under $100/month, and I’ve had real-time failover that supported 99.99% uptime. With the resources of a firm like BA, this should be a no-brainer given the risk profile.

What About Outsourcing?

Much has been made of the fact that BA has outsourced much of its IT functions to TCS and various other providers. Some have even tried to place blame on the providers for this outage. Frankly, I don’t have enough detail to blame anyone, and it seems more like the data center operator’s issue. However, I do think it speaks to the lack of attention and resources paid to technology at a company that clearly depends on it heavily. Computers and data are more important to business now than ever, and if your firm doesn’t value that, you are going to have problems down the road.

Conclusions

In the cloud era, I’m convinced no business, no matter how big or small should run with a single data center. It is way too cheap and easy to ship your backups to multiple sites, and be online in a matter of hours with a cloud provider. Given the importance and consolidation of airlines to our world economy, it probably wouldn’t be a terrible idea if their regulators created regulations requiring failover and failover testing. Don’t let this happen to your stock price.

//platform.twitter.com/widgets.js


Would You Fly a Plane with One Engine? Or Run Your Airline with One Data Center(re)?

Published On: By:

For those of you who may of been in the US or outside of Europe this past weekend, you may not have heard about the major British Airways IT outage, that took down their entire operations for most of Saturday and into Sunday. Rumors, which were later confirmed, were that a switch from primary to backup power at their primary data centre (they’re a UK company, so I’ll spell it in the Queen’s English), lead to a complete operations failure. I have a bit of inside information, since my darling wife was stuck inside of Terminal 5 at Heathrow.

Image result for jet with missing engine

There’s a requirement for planes that travel across oceans call ETOPs, which stands for Extended Range Operation with Two-Engine Airplanes, however in parlance is know as Engines Turn or Passengers Swim. This protocol and requirements are a set of rules that ensure if a plane has a problem over a body of water, it can make it back to shore for a safe landing. As someone who flies across oceans a decent amount, I am very happy the regulatory bodies have these rules in place.

However, there are no such rules for data centers that run airline operations. In fact, in January, Delta Airlines had a major failure which took down most of its operations for a couple of days. Most IT experts have surmised that Delta was running a single data center for it’s operations. Based on the evidence from Saturday’s incident with BA, I have to assume that they are, as well. One key bit of evidence, was that BA employees were unable to access email. They are an Office 365 customer, so theoretically, even if on-premises systems were down e-mail should work. However, if they were using Active Directory Federation Services, so that all of their passwords were stored on-prem, then the data center being down, would mean they couldn’t authenticate, and therefore would not have email.

This was my biggest clue that BA was running with a single data center—was that email didn’while some systems, particularly some of the mainframe systems that may handle flight operations, have a tendency to not do well with failover across sites, Active Directory is one of the best distributed systems there is, and is extremely resilient to failures. In fact, given BA’s global business, I’m really surprised they didn’t have ADFS servers in locations around the world.

Enter the Cloud

Denny and I sat talking yesterday and running some numbers on what we thought a second data center would cost a company like BA. Our rough estimate (and this is very rough) was around $30-40 million USD. While that is a ton of money, it is estimated that weekend’s mess may cost BA up to  £150 million. However, companies no longer have to build multiple data centers in order to have redundancy, as Microsoft (and Amazon, and Google) have data centers throughout the world. The cloud gives you the flexibility to protect critical systems, and at a much cheaper cost. I’ve designed DR strategies for small firms that cost under $100/month, and I’ve had real-time failover that supported 99.99% uptime. With the resources of a firm like BA, this should be a no-brainer given the risk profile.

What About Outsourcing?

Much has been made of the fact that BA has outsourced much of its IT functions to TCS and various other providers. Some have even tried to place blame on the providers for this outage. Frankly, I don’t have enough detail to blame anyone, and it seems more like the data center operator’s issue. However, I do think it speaks to the lack of attention and resources paid to technology at a company that clearly depends on it heavily. Computers and data are more important to business now than ever, and if your firm doesn’t value that, you are going to have problems down the road.

Conclusions

In the cloud era, I’m convinced no business, no matter how big or small should run with a single data center. It is way too cheap and easy to ship your backups to multiple sites, and be online in a matter of hours with a cloud provider. Given the importance and consolidation of airlines to our world economy, it probably wouldn’t be a terrible idea if their regulators created regulations requiring failover and failover testing. Don’t let this happen to your stock price.

//platform.twitter.com/widgets.js


SQL Data Warehouse—Size Limits? (No, Not Really)

Published On: 2017-05-11By:

One of the promises of Azure SQL Data Warehouse is the ability to have petabyte scale. The ability to quickly scale data, and have that data scale independently of compute resources. So when one I my clients emailed me yesterday with this screenshot, needless to say I was concerned.

image

As you can see, when  the properties screen shows a max size of 5 GB. So I sent a frantic email to one of my colleagues on the SQL Data Warehouse team (thanks, JRJ). Turns out, due to the distributed architecture of SQL DW, that max database size refers to the size of the control node, and not the underlying data nodes that service your queries. So feel free to ignore this.


SQLCLR in SQL Server 2017

Published On: 2017-04-19By:

Security of your databases has become the most important part of your job as a DBA. No one wants to be the next company to leak a list of customers, leak photos of children and their parents using your company’s product, or have a key marketing database released. Security is also a very complicated, multi-layered thing. There’s not a switch you can turn on to suddenly have good security. Additionally, you can be doing all the right things, and a few bad lines of application code can subvert all of the measures you have taken to protect your data.

With this is mind, Microsoft has made some big changes to CLR in SQL Server 2017. SQL CLR has always been an interesting area of the engine—it allows for the use of .NET code in stored procedures and user defined types. For certain tasks , it’s an extremely powerful tool—things like RegEx and geo functions can be much faster in native CLR than trying to do the equivalent operation in T-SQL. It’s always been a little bit of a security risk, since under certain configurations, CLR had access to resources outside of the context of the database engine. This was protected by boundaries defined in the CLR host policy. We had SAFE, EXTERNAL_ACCESS, and UNSAFE levels that we could set. SAFE simply limited access of the assembly to internal computation and local data access. For the purposes of this post, we will skip UNSAFE and EXTERNAL_ACCESS, but it is sufficed to say, these levels allow much deeper access to the rest of the server.

Code Access Security in .NET (which is used to managed these levels) has been marked obsolete. What does this mean? The boundaries that are marked SAFE, may not be guaranteed to provide security. So “SAFE” CLR may be able to access external resources, call unmanaged code, and acquire sysadmin privileges. This is really bad.

So What Now?

Microsoft is recommending enabling “CLR Strict Security” in SQL Server 2017, which means users cannot create any CLR assemblies unless they have elevated permissions. This could be a breaking change anywhere you want to use dynamic CLR. The user needs to have CREATE ASSEMBLY and one of the following options has to be true:

  • Database has the TRUSTWORTHY property on AND the database owner (DBO) has the UNSAFE ASSEMBLY permission on the server
  • Assembly is signed with a certificate or asymmetric key that has a corresponding login with the UNSAFE ASSEMBLY permission on the server

In short, the engine treats all CLR assemblies as unsafe, even if they are created with SAFE or EXTERNAL_ACCESS permission sets.

You will need to sign your assemblies. Also, if you try to restore a database with CLR assemblies into SQL Server 2017, you may run into issues if your assemblies are unsigned.

But Now I Have to Do Work?

What does this mean for you, the SQL DBA or Developer? This is a breaking change (and breaking changes are a BIG DEAL at Microsoft—they don’t like to break things unless there are really good reasons). This means if you are using CLR, and you want to move to SQL Server 2017 (or really, let’s face it, these security risks are in lower versions of the code), you need to work out a signing system for your CLR assemblies and possibly some key management infrastructure. You need to evaluate your older CLR code to ensure that none of it is running under UNSAFE or EXTERNAL_ACCESS (unless you want to turn off “CLR Strict Security”, which you really don’t want to do). Also, if you want to run SQL Server 2017 on Linux, you will be limited to the SAFE permission set.

We Only Have Third Party Apps…

Good luck with that. This is a big change, and we all know lots of smaller independent software vendors (ISVs) have just gotten around to supporting SQL Server 2014. In most of my work with ISVs, I haven’t seen a ton of CLR in most of the vendor apps. That being said, I’m  sure it exists in many ISV apps. The angle I would take as a DBA towards my ISV(s) is to use the “audit hammer” with the vendor. If you are not familiar with this approach to managing your vendors, you can mention that your internal audit process is now failing unsigned assemblies that can’t run under CLR Strict Security. This is probably accurate anyway—CLR has been an audit flag for years. I would also recommend doing a comprehensive testing process to ensure your CLR functions as you would expect.


1 2 3 15

Video

Globally Recognized Expertise

As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.

And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.