I was going to New York last weekend from my home of Philadelphia. We were running late for the train, and for the first time ever, I had a booked an award ticket on Amtrak. For reasons unbeknownst to me, you can not make changes to an award ticket on their app (I didn’t try the website). Additionally, when you call the standard Amtrak line, the customer service reps can’t change an award ticket, unless you have defined a PIN. This PIN is defined by telling an awards customer service rep what you want your PIN to be. (Because that’s really secure). While this is all god awful business process, that is exacerbated by crappy IT, it’s really down to bad business processes.
The bad application design came into play, when the Awards rep, tried to change my ticket, and she asked “do you have it open in our app? I can’t make changes to it if you have it open.” My jaw kind of dropped when this happened, but I went ahead and closed the app. Then the representative was able to make this change. We had to repeat the process when the representative had booked us into the wrong class of service. (The rep was excellent and even called me back directly).
But let’s a talk about the way most mobile apps work. There are a series of front-end screens that are local to your device, and most of the interaction is handled through a series of Rest API calls. The data should be cached locally to your device after the API call, so it can still be read in the event of your device being offline. If you are a database professional, you are used to the concept of ACID (Atomicity, Consistency, Isolation, Durability), which is how we ensure things like bank balances in our databases can remain trusted. In general, a tenant of most databases applications is that readers should never block writers–if a process needs to write to a record someone reading the record should not effect that operation. There are exceptions to this rule, but these rules are generally enforced by the RDBMS in conjunction with the isolation level of the database.
Another tenant of good database development is that you should do what you are doing, and get the hell out of the database. Whether you are reading or writing, you should get the data you need and then return either the data or the status of the transaction to your app. Otherwise, you are going to keep your transaction open, and impact other transactions, in a generally unpredictable set of timings. That’s the whole point of using the aforementioned Rest API calls–they do a thing, return your data, or that you updated some data, and then get the hell out.
What exactly is Amtrak’s app doing? Obviously I don’t have any backend knowledge, but based on that comment from the CS rep, opening your reservation in the mobile app, opens a database transaction. That doesn’t close. I can’t fathom why anyone would ever design an app this way, and I can’t think of a tool that would easily let you design an app like this. At some point, someone made a really bad design choice.
I’m not sure when this feature got introduced, but I just saw it for the first time recently (and I asked for it when Azure Database for MySQL server was in private preview); but Azure Database for MySQL server can show you the slow queries that are running against the MySQL database. In fact, it’ll log more than just the slow queries, queries that don’t have an index can be logged as well as long-running queries.
If you open the properties of a database in the Azure portal, there’s a couple of different places you can set these settings. The first is on the “Server parameters” blade where you’ll see all the various parameters that you can set for the MySQL deployment. The second is if you select the “Server Logs” option towards the bottom (it’s in the Monitoring section) you’ll see the log files that are being created (if there’s any). At the top of this page, you’ll see a blue bar which says, “Click here to enable logs and configure log parameters.” If you click that it’ll show you a shorter list of the parameters which only includes the settings that are related to the log file (shown in the screenshot).
Once you enable the settings that you need wait a few minutes for there to be load on the system, then go into the “Server Logs” setting of the Azure portal, and you’ll see a log file that you can download. Just open it in notepad (or your favorite text editor), and you’ll see the various log records that you enabled.
While I was playing around with this, I turned on the slow queries option, and I got a bunch of records in the log about queries that WordPress needs to do some tuning on. Since I’m guessing WordPress won’t be doing any database tuning, I’ll have to do it myself. But now that I have some data to work off of, I at least have a starting point.
A couple of weeks ago DCAC moved our website from being hosted in a single data center to being a globally distributed web application with multiple Azure WebApps with one hosted in the US and one in Europe. As part of having traffic manager configured and enabled for our site, we get some free reporting on performance of our webapp that’s included as part of the package. As you can see from our performance chart which we can see on the left, we can see where in the world our readers are coming from, and depending on if the system is able to capture it or not, we can see the performance that those users are getting (all the colors in the chart).
For some users, we aren’t able to get back performance data, and those users are shown in White. The users who are shown in a color (blue/green is good, red is bad) the system was able to get performance data for.
By using this basic reporting, we can see some pretty valuable information.
We can see where our website viewers are coming from
We can see what the performance for our viewers are
By showing us where in the world the folks reading our website, our coming from this gives us more insight into what kinds of things we should continue to write about. It also gives us insight into where we might need to deploy additional WebApps for better performance so that users in those areas get better performance.
By showing us the performance that users are getting around the world, we can see roughly what kind of performance improvement that users will see if we deploy another copy of our website to their region. As we can see from the graph users from India, South East Asia, Australia, and Africa are getting 200ms or higher performance levels from browsing our website. Based on this it might make sense to deploy another copy of the website into Singapore (which would improve performance in South East Asia, India, and Australia) and another copy of our website to the South African data center to improve performance for users in Africa.
Creating additional WebApps in these regions isn’t a give-in. 200ms isn’t a guarantee that there’s a problem. In our case we can talk to people in those various regions of the world and have them browse to the website and see if there’s a problem with viewing the website or not. In this case, the data that’s available from Azure is just a piece of the puzzle to see if there is a performance problem or not. This is because only the information that’s available for the portal isn’t everything.
Data classifications in Azure SQL DW entered public preview in March 2019. They allow you to label columns in your data warehouse with their information type and sensitivity level. There are built-in classifications, but you can also add custom classifications. This could be an important feature for auditing your storage and use of sensitive data as well as compliance with data regulations such as GDPR. You can export a report of all labeled columns, and you can see who is querying sensitive columns in your audit logs. The Azure Portal will even recommend classifications based upon your column names and data types. You can add the recommended classifications with a simple click of a button.
You can add data classifications in the Azure Portal or via T-SQL or PowerShell. Data classifications are database objects.
ADD SENSITIVITY CLASSIFICATION TO
WITH (LABEL='Confidential', INFORMATION_TYPE='Contact Info')
sys.all_objects.name as [TableName],
sys.all_columns.name as [ColumnName],
left join sys.all_objects on sys.sensitivity_classifications.major_id = sys.all_objects.object_id
left join sys.all_columns on sys.sensitivity_classifications.major_id = sys.all_columns.object_id
and sys.sensitivity_classifications.minor_id = sys.all_columns.column_id
Be Careful When Loading With CTAS and Rename
One issue that is specific to using data classifications in Azure SQL DW is that it is possible to inadvertantly drop your classifications when you are loading your tables using the recommended T-SQL load pattern. Typically, when using T-SQL to load a dimensional model in Azure SQL DW, we perform the following steps:
Create an upsert table via CTAS with the results of a union of new data from a staging table with existing data from the dimension table
Rename the dimension table to something like Dimension_OLD
Rename the upsert table to Dimension
Drop the Dimension_OLD table
In the animation above, you’ll first see the load process as described, and then it will replay with sensitivity labels added to the dimension table. You’ll see that they are dropped when we drop the old dimension table. This makes sense because sensitivity classifications are objects related to that table. We would expect an index to be dropped when we drop the related table. This works the same way.
Check out my SQL notebook for a demonstration of the issue as well as my workaround that I describe below. If you spin up an Azure SQL Data Warehouse with the sample database, you can run this notebook from Azure Data Studio and see the results for yourself.
There are a few complicating factors:
There are currently no visual indicators of sensitivity classifications in SSMS or Azure Data Studio.
ETL developers may not have access to the data warehouse in the Azure Portal to see the sensitivity classifications there.
The entire process of adding and managing sensitivity classifications may be invisible to an ETL developer. A data modeler or business analyst might be the person adding and managing the sensitivity classifications. If the ETL developer isn’t aware classifications have been added, they won’t know to go and look for them in the sys.sensitivity_classifications view.
SSDT does not yet support sensitivity classifications. The only way I have found to add them into the database project is as a post-deployment script with the build property set to none.
The good news is that you can add the sensitivity classifications back to your dimension table using T-SQL. The bad news is still that the ETL developer must remember to do it. My workaround for now is a stored procedure that will do the the rename and drop of the tables plus copy the sensitivity classifications over. My hope is that it it’s easier to remember to use it since it will do the rename and drop for you as well.
Eventually, the tools will be updated to provide more visibility to data sensitivity classifications, but we still need to make sure they don’t get dropped.
For now, my recommendation is if you are going to go in and add a lot of sensitivity classifications, that you create a user defined restore point immediately after so that you know you have them in a backup somewhere. Azure SQL DW doesn’t do point-in-time restores the way Azure SQL DB does. It takes automatic restore points every 8 hours or so. So if someone went through the trouble of adding the sensitivity classifications and they were dropped through the data load process, there is no guarantee that you could use a backup to get them back.
Vote for My Enhancement Idea
If you would like Microsoft to add something to the product to keep sensitivity classifications from being dropped, or at least make it easier to add them back, please vote for my idea.
Not an Issue with Other Data Load Methods
Please note that if you are using other tools or methods to load your tables where you don’t swap them out, you won’t have the issue of dropping your sensitivity classifications. But I wanted to bring up this issue because I can see people spending a lot of time adding them and then suddenly losing them, and I want everyone to avoid that frustration.
Give Data Classifications a Try
I think data classifications are a good addition to SQL DW. Anything that helps us efficiently catalog and manage our sensitive data is good. I have added them in my demo environment and hope to use them in a client environment soon.
Have you tried out data classifications in SQL DW or DB? What do you think so far? If not, what is keeping you from using them?
Watch our recorded webcast presented by Denny on June 6th 2019 at 11am Pacific / 2pm Eastern where we’ll review many of the performance enhancements of SQL Server 2019 such as Intelligent Query Processing, and why your scalar functions might not suck anymore.
See All Our Live and Recorded Wedbcasts
As Microsoft MVP’s and Partners as well as VMware experts, we are summoned by companies all over the world to fine-tune and problem-solve the most difficult architecture, infrastructure and network challenges.
And sometimes we’re asked to share what we did, at events like Microsoft’s PASS Summit 2015.