SQL Saturday St. Louis – February 2020

I will be speaking at SQL Saturday St. Louis on Saturday, February 8th at 8:00 am and 10:20 am. The topics are:

8:00 am Moving Data to the Cloud (with Azure Data Factory)

You need to move data. A lot of data. To the cloud. You’ve got data in a variety of on- and off-site data sources. There are several ways to do it. Some of them can be quite easily implemented using Azure Data Factory. Learn how to use variables and looping in your Data Factory pipelines. Use the Integration Runtime to pull directly from on-site sources. See how to upload files to blob storage and import them. Learn how to trigger Data Factory activities. And, learn how to keep all those connection strings and passwords secret in Azure Vault. After this session, you will have tools that you can readily implement in your own data migrations.

10:20 am Why Learn Python? A Microsoft DB/ETL/BI Developer’s Answer

You’re a Microsoft Developer. C#, MSSQL, SSIS/SSRS, SSMS, and Azure are your tools of choice. Why would you want to learn Python? In this session, I will show you several take-home utilities that use Python. The first hunts through a folder structure full of SSIS packages looking for the one(s) that load(s) a specified table. The second executes the data sources in an SSRS report to identify performance problems and document them in Excel. The third GeoCodes the City/Country names from a SQL table, getting the Lat/Lng so you can use the data in maps. Familiarity with Python is not necessary to use the utilities, but we’re not going to do Hello World either. This is real Python for Microsoft Database, ETL and BI Developers. This all-demo session shows you how to use Python with the .Net CLR, XML, ODBC, Excel, SQL Server, and Web API calls.

Azure Data Factory Publishing Error

I am using Azure Data Factory v2. It is bound to an Azure DevOps GIT repository. I made some changes in my branch, including deleting some obsolete items. Then I merged my changes back to the ‘master’ branch.

Now, I’m trying to publish my changes from the ‘master’ branch to the Azure Data Factory. When I do, it says it’s deployed 33 of 33 changes, and then it fails, saying:

“Publishing Error

The document cannot be deleted since it is referenced by <some Pipeline>.”

I searched high and low looking for some evidence that the offending Pipeline existed anywhere in the GIT repo. It did not.

Then, I discovered that you can change from the Azure DevOps GIT version of the Data Factory to the actual Data Factory version by selecting the latter from the dropdown in the top left corner of the Data Factory editor:

This saved the day. I was able to locate and delete the offending Pipeline(s) directly from the actual Data Factory. Then, when I switched back to Azure DevOps GIT mode, it allowed me to publish the changes from the ‘master’ branch. Woohoo!

Connecting to IBM DB2 zOS from Azure Data Factory v2

Connecting to IBM DB2 zOS from Azure Data Factory v1 was a matter of setting up the Azure Data Gateway on an on-prem server that had the IBM DB2 Client installed; creating an ODBC connection to DB2 (I called it DB2Test).  Then, in the Data Factory v1 Copy Wizard, Select the ODBC source, pick the Gateway, and enter the phrase:  DSN=DB2Test into the Connection String.  This worked for us.

Azure Data Factory v2

First, the Azure Data Gateway is now called “Hosted Integration Runtime”.  So download and install the IR client on your on-prem gateway machine.  On my machine, it auto-configured to use the existing Data Factory Gateway configuration, which is NOT what I wanted.  After uninstalling and reinstalling the IR client a couple of times, it stopped auto-configuring and asked me for a key.  To get the key, I had our Azure Dev configuration guy run the following PowerShell:

Import-Module AzureRM
$dataFactoryName = "myDataFactoryv2NoSSIS"
$resourceGroupName = "myResourceGroup"
$selfHostedIntegrationRuntimeName = "mySelfHostedIntegrationRuntime"
Login-AzureRmAccount
Set-AzureRmDataFactoryV2IntegrationRuntime -ResourceGroupName $resouceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntimeName -Type SelfHosted -Description "selfhosted IR description"
Get-AzureRmDataFactoryV2IntegrationRuntimeKey -ResourceGroupName $resourceGroupName -DataFactoryName $dataFactoryName -Name $selfHostedIntegrationRuntime

I then pasted the Key into the Integration Runtime Configuration screen, and it connected properly to myDataFactoryv2NoSSIS.  Tada:

snip_20180124073749

Next, is to test the connection to DB2.  I went to the Diagnostics tab, entered the DSN and credentials, just like I did for Data Factory V1:

Failed to connect to the database. Error message: ERROR [HY010] [IBM][CLI Driver] CLI0125E Function sequence error. SQLSTATE=HY010

Dang! Much googling later, I found this obscure note.

I added the phrase “Autocommit=Off” to the DSN in the connection string, and voila, the connection worked.  So my final diagnostic looked like this:

snip_20180124074603

YAY!!