When Loading Data, Should I Drop Indexes or Not?

I just ran a few simple tests in Azure SQL DB to see how each would perform.

I have a target table with an identity column that is the clustered primary key, and two other indexes that are the same except for the field order. (Whether having both is useful is a question for another day.) Here’s the DDL for the target table:

	Field1 [numeric](12, 0) NULL,
	Field2 [smallint] NULL,
	Field3 [int] NULL,
	Field4 [smallint] NULL,
	Field5 [nvarchar](1) NULL,
	Field6 [nvarchar](2) NULL,
	Field7 [numeric](4, 0) NULL,
	Field8 [numeric](2, 0) NULL,
	Field9 [nvarchar](2) NULL,
	Field10 [nvarchar](8) NULL,
	Field11 [datetime2](7) NULL,
	Field12 [nvarchar](8) NULL,
	Field13 [datetime2](7) NULL,
	[UpdateType] [nchar](1) NOT NULL,
	[RowCreated] [datetime2](7) NOT NULL,


Test 1: Truncate the Target, Drop the Indexes, Insert the Data, Recreate the Indexes.

Test 2: Drop the Indexes, Truncate the Target, Insert the Data, Recreate the Indexes

Test 3: Just Truncate the Target and Insert the Data

Test 4: Truncate the Target, Drop the non-clustered Indexes (leaving the the clustered index on the identity column), Insert the Data, Recreate the non-clustered Indexes.

Here are the results. All timings are in milliseconds. These were run on a PRS1 instance of Azure SQL Database.

Test 1:
Trunc then
Drop Idxs
Test 2:
Drop before
Test 3:
No Drop/
Test 4:
Trunc but don’t
drop clustered
Truncate4 2 04
Drop PK8 4  n/a  n/a 
Drop Index 15 23,630 n/a 2
Drop Index 26 2  n/a 2
Insert 1.84 M rows83,033 82,315 161,706 83,205
Create PK20,454 21,205  n/a  n/a 
Create Index 112,149 12,264  n/a 12,265
Create Index 211,142 11,313  n/a 11,247
Total Time (ms)126,801 150,735 161,706 106,725
Total Time (mins)2.11 2.51 2.70 1.78
Delta (ms)0 23,934 34,905 (20,076)

Test 4 was the clear winner as it avoided the cost of recreating the clustered index. Which makes sense as the clustered index was being filled in order by the identity column as rows were added. Test 1 came in second, so if your clustered index is not on an identity column, or you have no clustered index, you are still better off dropping and recreating the indexes.

Conclusion: When inserting larger data sets into an empty table, drop the indexes before inserting the data, unless the index is clustered on an identity column.

Azure Data Factory Publishing Error

I am using Azure Data Factory v2. It is bound to an Azure DevOps GIT repository. I made some changes in my branch, including deleting some obsolete items. Then I merged my changes back to the ‘master’ branch.

Now, I’m trying to publish my changes from the ‘master’ branch to the Azure Data Factory. When I do, it says it’s deployed 33 of 33 changes, and then it fails, saying:

“Publishing Error

The document cannot be deleted since it is referenced by <some Pipeline>.”

I searched high and low looking for some evidence that the offending Pipeline existed anywhere in the GIT repo. It did not.

Then, I discovered that you can change from the Azure DevOps GIT version of the Data Factory to the actual Data Factory version by selecting the latter from the dropdown in the top left corner of the Data Factory editor:

This saved the day. I was able to locate and delete the offending Pipeline(s) directly from the actual Data Factory. Then, when I switched back to Azure DevOps GIT mode, it allowed me to publish the changes from the ‘master’ branch. Woohoo!

Creating an Alexa Skill in C# – Step 1

Recently, I decided to make an Alexa skill that I could play on my boss’s Alexa.  At first, I was doing it as a gag, but I figured that wouldn’t work as he has to actively install my skill onto his Alexa.  Now it reads some stats from one of our Azure databases and publishes those in a conversation.  Here’s how I built it.

Step 1:  Creating the Alexa Skill Model

I don’t know any of the other languages that you can write a skill in, so I chose to write it in C#.  That means getting a bunch of bits and loading them into Visual Studio.  But first, you’ll need to start the process of creating a skill.  At the time of this writing, these are the steps I took.

  1. Start Here.
  2. Click the Start a Skill button.capture20180507114817423
  3. If you don’t have an Amazon Developer account, create one.
  4. Eventually, you’ll get to the Alexa Skills Developers Console where you can click the Create Skill button.  capture20180507115228773
  5. Give your skill a name.  I’m calling mine:  Simon’s Example Skill.
  6. On the Choose a Model screen, select Custom. capture20180507115624173
  7. Then click Create Skill.
  8. You should now be on the Build tab for your Skill.  Notice the tree view on the left and the checklist on the right.capture20180507115839651

Invocation Name

The Invocation Name is the phrase that an Alexa user will use to launch/run/start your Alexa Skill.  It could be “joe’s hot dog recipes” or some such.  It needs to be lower case, there are some restrictions on the characters you can use, and any abbreviations you use need periods to separate the letters so Alexa knows to read the letters and not pronounce the word.  Read the Invocation Name Requirements for more details.

Click the Invocation Name link in the tree view on the left or the first option in the checklist on the right.  Then give your skill an Invocation Name.  I named mine:  “simon’s example”.


The Intents are the various functions your skill can perform.  For example, stating today’s date, looking up some data, figuring something out, etc.  Let’s do all three.

First, my skill is going to provide today’s date, so I’m going to name my first Intent, “GetTodaysDateIntent”.  My skill is also going to look up some data in an Azure SQL Database, so I’m going to name my second Intent, “DataLookupIntent”.  Lastly, I want to figure something out, like the average temperature in a major US city.


The Utterances are the phrases an Alexa user might say to trigger your Intent (function).  You should put in several Utterances using synonyms and different phrasing so Alexa has a better chance of triggering your Intent instead of responding with “I don’t know how to do that.”

For the GetTodaysDateIntent, I added the following utterances:



Within an Utterance, you can have Slots (or placeholders), that represent the multiple options for that slot.  For example, if my table has the count of animals per household per neighborhood per county in it, I might want to create slots for Animal Type, Household Size, Neighborhood Name, and County Name.  You can do this by typing a left brace { in the Utterance box.


Here are three of my sample utterances for the DataLookupIntent:


Once you have created a slot, you need to populate it with options.  You do this in the bottom half of the Utterance screen.


You can easily select one of the pre-defined Slot Types in the drop-down.  In my case, Amazon has a list of Animals, so I’ll pick AMAZON.Animal in the first slot.

I need to manually add a few Counties for the second slot though.  And at this time, you don’t want to click Edit Dialog (though it’s tempting).  Instead, you want to define your own Slot Type by clicking the Add (Plus) button next to Slot Type in the tree view on the left:


For example, here is the custom Slot Type for County Name:


Notice the synonyms column.  This is important if there are synonyms, for example, a 1 person household and a single person household are synonymous.  So be certain to add any synonyms you can think of.  Here is my custom Slot Type for Household Size, notice the synonyms off to the right:


Now that you’ve defined some custom Slot Types, you can click on the Slot names under the Intent in the tree view on the left and select the newly created Slot Type for each Slot.


For the GetAverageTemperatureIntent, I added one Utterance:


And configured the {City} slot as follows:


Finally, you can Save your Model and Build it by clicking the buttons at the top of the screen:


Hopefully, the model will build and we are ready to move on to Step 2.  If the model doesn’t build, check the bottom right of the screen for a list of the errors:


Fix the errors until the model builds:


Then go to Step 2.





Connecting to IBM DB2 zOS from Azure Data Factory v2

Connecting to IBM DB2 zOS from Azure Data Factory v1 was a matter of setting up the Azure Data Gateway on an on-prem server that had the IBM DB2 Client installed; creating an ODBC connection to DB2 (I called it DB2Test).  Then, in the Data Factory v1 Copy Wizard, Select the ODBC source, pick the Gateway, and enter the phrase:  DSN=DB2Test into the Connection String.  This worked for us.

Azure Data Factory v2

First, the Azure Data Gateway is now called “Hosted Integration Runtime”.  So download and install the IR client on your on-prem gateway machine.  On my machine, it auto-configured to use the existing Data Factory Gateway configuration, which is NOT what I wanted.  After uninstalling and reinstalling the IR client a couple of times, it stopped auto-configuring and asked me for a key.  To get the key, I had our Azure Dev configuration guy run the following PowerShell:

Import-Module AzureRM
$dataFactoryName = "myDataFactoryv2NoSSIS"
$resourceGroupName = "myResourceGroup"
$selfHostedIntegrationRuntimeName = "mySelfHostedIntegrationRuntime"
Set-AzureRmDataFactoryV2IntegrationRuntime -ResourceGroupName $resouceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName -Type SelfHosted -Description "selfhosted IR description"
Get-AzureRmDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntime

I then pasted the Key into the Integration Runtime Configuration screen, and it connected properly to myDataFactoryv2NoSSIS.  Tada:


Next, is to test the connection to DB2.  I went to the Diagnostics tab, entered the DSN and credentials, just like I did for Data Factory V1:

Failed to connect to the database. Error message: ERROR [HY010] [IBM][CLI Driver] CLI0125E Function sequence error. SQLSTATE=HY010

Dang! Much googling later, I found this obscure note.

I added the phrase “Autocommit=Off” to the DSN in the connection string, and voila, the connection worked.  So my final diagnostic looked like this: