The Common ETL Framework Part 2

Auditing/Logging

Each package is responsible for logging at two different levels:  Audit and Debug.

The Audit level of logging needs to record the following for each package execution:

  1. Start Day and Time
  2. Executing Machine name
  3. Executing as Username
  4. Records Affected (if appropriate) – Can include Records Inserted, Updated, Deleted, Archived, Errored, etc.
  5. Error data if an error occurs – Includes SQL Error, Call Stack, etc.
  6. Presence of Bad Data, if any, with specifics
  7. End Day and Time

The Debug level of logging records all the Audit level items, plus Control Flow Task level logging, plus reporting the Bad Data, if any, with specifics.

In SSIS, this level of logging can be handled by adding Package and Task level event handlers.  The BIML code to create them looks like this.  There are several placeholders in this code that will be expanded upon in a later post.

Package Level Event Handlers:

package-level-event-handlers

Main Control Flow Level Task Event Handlers:

task-level-event-handlers

Error Handling

There is a Package Level Error Handler in the example above.  Basically, we need to capture the errors at the package and task level and log them to an Audit.PackageExecutionError table.  Note: Bad Data that is being redirected to an Audit.BadDataLog table is not going to raise an error.  The Framework needs to be able to handle both sorts of errors and report on either of them in the notification portion of the Framework.  The Bad Data data flow is shown in the diagram below.

data-flow-error-handlers

Notification/Routing

Depending on the Success, Failure/Error, Unusual Behavior, or Bad Data condition of the Package or Task, there are different people that need to be alerted.  The first pass of implementing the Routing will look at SMTP Email as the notification routing mechanism.  A later pass might look at HTTP or Web Service notifications.

All notification contact info will be in the Audit schema and will be maintainable through simple table updates.  Email addresses and notification recipients must not be hard coded into the packages, neither by a developer nor by BIML.  Notification recipients must be determined dynamically at run-time.

In the Success condition, an optional statistical email to the specified notification recipients would suffice.  This email would show the Success of the Package and include Package Name, Start Time, End Time and Elapsed Time, Data Source, Target DB, and the Affected Row Counts.  Notification recipients should be configurable to receive notifications Daily, Weekly, Bi-weekly, Monthly or Not at All.

In the Failure/Error condition, a detailed error message needs to be sent to the notification recipients that are indicated as SendErrorNotice in their contact profile.  Every package must name at least one SendErrorNotice contact.  These contacts will receive an email that shows that there was an error, the Package Name, Start Time, End Time and Elapsed Time, Data Source, Target DB, Schema, and Table Name, Affected Row Counts, Error Message(s), Stack Trace (if available), Bad Data Condition (if there is one), and as much other information as we can harvest from SSIS.  The difference between a Failure and an Error is that a Failure is expected in certain circumstances, but an Error is not expected.

In the Unusual Behavior condition, a message needs to be sent to the SendErrorNotice recipients, as in the Failure condition above.  However, the Unusual Behavior notification will document the Package Name, Start Time, End Time and Elapsed Time, Data Source, Target DB, Schema, and Table Name, Affected Row Counts, and Bad Data Condition (if there is one), as well as documenting the Unusual Behavior that triggered this condition.  For example, a package normally processes 100,000 rows, if it suddenly processes 500,000 rows, this would be Unusual Behavior.  Likewise, for a package that typically takes 3 minutes to run, to take 20 minutes to run would be Unusual Behavior.

In the Bad Data condition, the package has not Failed, Errored, or experienced Unusual Behavior.  It has experienced rows being routed to the Bad Data path(s) in the Data Flow.  Details of the Bad Data need to be sent to the notification recipients that are indicated as SendBadDataNotice in their contact profile.  These notifications should include the Package Name, Start Time, End Time and Elapsed Time, Data Source, Target DB, Schema, and Table Name, Affected Row Counts, Bad Data Row Counts, and Bad Data Row Details.

Multi-threading

Multi-threading allows the package to run faster when there are tasks that can be run in parallel.  There are two multi-threading conditions that the Framework needs to handle:

  • Master Package Multi-threading of multiple child packages
  • Package Task execution within a package

Re-run ability

Packages should be re-runnable if the data has changed and we need to reload it.  For example, bad data was detected, logged and handled.  Now the bad data has been fixed, we need to re-run the package for today’s data.  This implies that the package would know that its last run had bad data, that the package actions need to be reversed and then the package can reload the data for the day.  This is most often implemented as a clean-up step at the beginning of the ETL package that makes sure that none of the data it’s about to put into the database is already there and removing it if it is.  For example, the package was loading data into the final tables when the database ran out of space and threw an error.  Once the space issue is fixed, the package would be re-run.  First, it would remove the data it loaded in the first run, then it would load the data normally as if it had not been run before.

Restartability

Packages should be able to pick up from the task prior to the failure point and run properly (where this makes sense).  There’s no need to stage data we’ve already staged.  Typically, when a package fails, it has already done some processing.  Packages should be able to roll back the failed task and restart from that point.  This is an advanced ETL framework activity.  It can be very tricky to deal with restarting a failed package, especially if it has any complexity in it.

The Common ETL Framework Part 1

Defining a Common ETL Framework

SQL Server Integration Services (SSIS) provides a mechanism to Extract data from some source, Transform it as necessary, and Load it into the tables in a target database (ETL).

Each SSIS package should implement a common framework for logging, executing, monitoring and error handling so that these details are abstracted from the SSIS developer and are “just part of the template” that they should use for SSIS package creation.

Package automation takes this one step further by automating the creation of packages based on metadata that is configurable at design time.  I have selected Business Intelligence Markup Language (BIML) to perform this automation.  I will demonstrate how to use BIML to configure and generate packages in future articles in this series.

What Does the Common ETL Framework Need to Do?

The Framework needs to provide the services and behaviors that are common to all ETL packages.  These include:

  1. Auditing/Logging.  Execution Start, Stop and Error auditing for packages and tasks as they are executed.  Auditing/logging verbosity should be configurable so that we can enhance the logging when we need to debug something.
  2. Error Handling.  Appropriate error handling. Reroute bad data on read and on write.  Break the process only when necessary.  Log data, connectivity and process problems.
  3. Notifications and Routing.  Intelligent routing so that the appropriate systems and system owners are notified of a failure without spamming the whole department.
  4. Multi-Threading.  Master packages need to manage the multi-threading and dependencies of their child packages, watching for errors and handling them appropriately (i.e. Stop or Continue execution of other child packages upon a failure).
  5. Restartability.  Packages should be able to pick up from the task prior to the failure point and run properly (where this makes sense).  I.e.  No need to stage data we’ve already staged.
  6. Re-run ability. Packages should be re-runnable if the data has changed and we need to reload it.  For example, bad data was detected, logged and handled.  Now the bad data has been fixed; we need to re-run the package for today’s data.  This implies that the package would know that its last run had bad data, that the package actions need to be reversed and then the package can reload the data for the day.
  7. Load Strategies.  The Framework should handle a variety of data Extract and Load strategies:
    1. Truncate and Replace loads. Where appropriate, table data can be truncated and reloaded.  This technique is common for staging tables and for smaller datasets with no dependencies.
    2. Incremental loads. Where appropriate, data can be extracted incrementally and/or loaded incrementally, rather than a truncate and reload.  This is often used to keep history.  Incremental loads come in several flavors, some use a hash, others a date flag, others a set of change flags.  Deletions require special handling in incremental loads.
    3. Cycle loads. Where appropriate, a dataset can be loaded with a Days Old flag set to -1; then a simple update can increment the Days Old flag by 1 to cycle the data by a day.  A delete query can then remove rows that are too old to be kept.
  8. Data Retention.  The Framework should also handle data retention during the ETL process.  There are three strategies for removing data from the primary dataset:  deletion, archiving, and flagging.  A typical retention plan will employ all three to varying degrees.
  9. Stop/Cancel Job.  The Framework should be able to interrupt a job, stopping it immediately, or canceling it at the next step.  This also implies that the job is already Restartable or Re-runable.
endmark endmark endmark

Redirecting Web Traffic from HTTP to HTTPS

So I need to redirect the incoming traffic to my website so that it always uses https/SSL.

I am running IIS 7.5 on my Windows 7 machine.  I googled for “how do I make sure that all my web traffic uses https“.

The second link showing was to my web host, GoDaddy.com.  They have a page that says how to do it.  But what they didn’t tell me was how much effort I would have to go to to get it to work.

First, I had to realize that the URL Rewrite Module was an IIS plug-in and not something that comes “in the box.”  I downloaded and installed the 64-bit version.  Once installed, I could see the URL Rewrite icon in IIS Manager.
URLRewrite

Second, I added the code to my web.config file.  Here’s the code:

<configuration>
<system.webServer>
<rewrite>
    <rules>
	<rule name="HTTP to HTTPS redirect" stopProcessing="true"> 
	    <match url="(.*)" /> 
	    <conditions> 
		<add input="{HTTPS}" pattern="off" ignoreCase="true" />
	    </conditions> 
	    <action type="Redirect" redirectType="Permanent" 
                    url="https://{HTTP_HOST}/{R:1}" />
        </rule>   
    </rules>
</rewrite>
</system.webServer>
</configuration>

Four hours later, it works…

Here are the gotchas I ran into:

  1. Things get complicated if you try to put the redirect in a subsite.  The Rule code above must be in the web.config file in the root website.  Not a subsite.
    (I never did figure out how you might do that.  In my case, it doesn’t matter because I want my entire domain to be running in HTTPS mode.)
  2. You cannot go to the SSL Settings screen in IIS Manager and check the option to require SSL.  That option must remain unchecked to allow the HTTP traffic through so the rule can redirect it.
    SSLSettings
  3. Your website must have bindings to both HTTP and HTTPS.
    1. In IIS Manager, select the root website on the left.
    2. Next, select the Bindings option under Edit Site in the Action bar on the right.Bindings
    3. Make sure you have bindings for HTTP (typically port 80) and HTTPS (typically port 443).
  4. You do not want to mess with the HTTP Redirect option in IIS Manager.  That will not solve the original problem. The HTTP Redirect is used to forward all traffic from, say, a parked domain, to another domain entirely.
    HTTPRedirect
  5. Did I mention that the Rule code above only works at the root level of the server?   Make sure it is in the web.config at the root level, not in a virtual directory or subsite below the root.

And that’s all I have to say about that.

Oh, one more thing.  Many thanks to RUSLAN YAKUSHEV ruslany at Microsoft for his excellent documentation and blog articles about the URL Update Module.

Hadoop Follow Up – Hortonworks HDP Sandbox

The Hortonworks Hadoop Sandbox download got corrupted the first time.  It worked fine the second time.

Installation

I installed Oracle VirtualBox first.  Then, in the Oracle VM VirtualBox Manager, I select the File | Import Appliance… option, selected the HDP_2.4_virtualbox_v3.ova file and clicked Next and Import.
Importing the HDP Appliance

A few seconds later, the box was installed, so I started it up.  After  loading and starting a ton of stuff, it seemed to stop doing things and the screen looked like this:
HDP Appliance Screen

Connecting to the VM

I dismissed the two messages at the top and tried a zillion things to figure out what to do next.  Nothing.  Then I read something in the Hortonworks Tutorial in the Hello World section of the Hortonworks tutorial site about the box’s address and how to connect to the Welcome Screen.  No wonder I couldn’t do anything inside the VM itself, the interface is web-based and uses the URL:  http://127.0.0.1:8888/.  Entering that URL into my browser, I connected and saw this:
HDP Welcome Screen

Then I ran into difficulty because the firewall at work won’t let me download the tutorial files.  Ack!

My First Foray into Hadoop

So I have a big dataset (1.7 billion rows) that I want to analyze.  I figured, “Hey, Hadoop is all over this Big Data thing, I wonder if I can do a Proof of Concept?”

Compiling Hadoop on Windows (Ugh!)

So, first, I tried to follow some instructions on how to get the Hadoop source into Windows and compile it.  It turns out that Hadoop is Jave based and most Hadoop programmers are Java programmers.  So a lot of the instructions are in Java.  And, good for me, the build engine is Maven, which I happen to know quite a bit about thanks to the weeks at CompanionCabinet where I automated the build using Maven.

However, it turned out the Ant was having a problem with running the SH command and after several tries, I went googling for an already compiled version of the Hadoop project.  Low and behold, I found one on GitHub:  https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries.  In the middle of the top area of the page is a “1 Release” link.  Click there to download the binary:

Hadoop Binary

Installing all the bits

Based on the wiki article here:  http://wiki.apache.org/hadoop/Hadoop2OnWindows.

I found the link to this:  Building.txt

Near the bottom of that file, are some incomplete instructions on what to download, install and do to compile your own version of Hadoop in Windows.

So I downloaded all these:

  1. Java Developers Kit (JDK) 1.7.0_80, includes Java Runtime Environment (JRE) 7.
    JDK Download
  2. Maven 3.3.9.
  3. Cygwin 64.
  4. CMake 3.5.2.
  5. zlib 128.
  6. protobuf 2.5.0.
  7. Windows 7 SDK.

Then I installed or unzipped the files.

  1. JDK 1.7 is an install.  I let it install to Program Files\Java.
  2. I copied the Maven file to the Java folder and unzipped it to a new folder (apache-maven-3.3.9).
  3. I installed Cygwin to the Program Files\Java\Cygwin folder.
  4. I installed CMake and accepted the defaults.
  5. I unzipped the zlib 128 files to Program Files\Java\zlib128-dll.
  6. I unzipped the protobuf files to Program Files\Java\protobuf-2.5.0.
  7. I tried to install the Windows 7 SDK but it had issues, which I ignored and proceeded on since I wasn’t going to compile my own Hadoop after all.
  8. I unzipped the Hadoop files to \hadoop-2.7.1.

Then I did the following steps:

  1. JAVA_HOME must be set, and the path must not contain spaces. If the full path would contain spaces, then use the Windows short path instead.  In my case, this was:
    set JAVA_HOME=C:\Progra~1\Java\jdk1.7.0_80\
  2. I created a C:\tmp folder because I didn’t have one and, by convention, Hadoop uses it.
  3. I added the ZLIB_HOME environment variable and pointed it to C:\Program Files\Java\zlib128-dll\include.
  4. I added several items to the PATH variable:  C:\Program Files\Java\apache-maven-3.3.9\bin;C:\Program Files (x86)\CMake\bin;C:\Program Files\Java\zlib128-dll;C:\Program Files\Java\Cygwin\bin

With all that in place, I was ready to start Hadoop.

Starting Hadoop

Apparently I have to configure several files in the Hadoop\etc\configure folder first.

Section 3 on the wiki page describes in detail how to change the configuration files.

I combined that information with the steps found on this article to create the file system, create a directory and put my txt file there.

What’s Next?

I am not sure what’s next.  Looks like I have some learning to do.

This article gives a nice technical overview of Hadoop.

And then I discovered Hortonworks.  Hortonworks Sandbox is an open-source VM with Hadoop and a bunch of tools already fully configured.  So I downloaded this onto a different machine and am trying it out right now.  I’m going to try the VirtualBox VM.  I used VMWare Player and VirtualBox some time ago and found VirtualBox a lot easier to work with.  It looks the Hortonworks HDP Sandbox is going to take a while to download.  See you again on Monday.

In the meantime, I’m going to check out this tutorial on edureka.

 

Oops…I Broke the SQL Server

So this happened.  In an attempt to give my SQL Server Instance access to more memory, I set the Max Memory to zero, expecting that to mean infinite.  No luck, SQL Server Management Studio (SSMS) set the max memory to 16MB instead and broke the instance.  I could not do anything more in SSMS because the instance did not have enough memory.

Setting the Max Memory too low

Note:  The screenshots in this article are from a SQL Server 2014 instance, which has apparently fixed this problem so that the Max Server Memory setting defaults to 128MB when you set it to zero and you can still connect with SSMS at 128MB.  In 2012 and prior versions, the setting defaults to 16MB, which is what causes all the trouble.

So I googled for: “accidentally set max server memory to 0”.  This turned up a ton of useful links, but since I had to piece my solution together from various posts, I have created this blog entry to hopefully help someone else get to the solution more quickly.

How to Increase SQL Server Max Memory in a Named Instance

  1. First, you will need to be an administrator on the SQL Server Instance you want to fix.
  2. Now, from the Start | Run menu in Win 7, or Start | Search in Win 10, look for CMD.
    Start Run CMD   Start Search CMD
  3. Now RIGHT-Click on cmd.exe or Command Prompt and select Run As Administrator.
  4. Repeat steps 2 and 3, so you have two command windows open. Like so:
    Two Command Windows
    In the left window, we will start the instance we need to repair.  In the right window, we will connect to that instance with SQLCMD and fix the memory setting.
  5. In the both windows, you need to change to the Binn directory of the SQL Instance that you want to repair.  In my case, this instance is on the D: drive so I have to switch to that first.  Also, this instance is found in the D:\Program Files\Microsoft SQL Server\MSSQL12.DUMBO folder.  Lastly, the Binn folder is below the instance atInstance\MSSQL\Binn path.  So I enter these two commands:

    D: <enter>

    Then,

    cd D:\Program Files\Microsoft SQL Server\MSSQL12.DUMBO\MSSQL\Binn <enter>

    Change Directory

  6. Now that I am in the Binn folder, I can start the SQL Server Instance.  Note:  This assumes the instance is stopped.  Go to Start | Control Panel | Administrative Tools | Services and find the SQL Server (InstanceName) Service and make sure the Status column is blank.  If it says ‘Started’, then right-click it and Stop the service.
    Administrative Tools
  7. So, back to the command window.  On the left side, we need to start the SQLSERVR service in single user mode, being sure to name the correct instance, like so:

    sqlservr -f -sDUMBO (where DUMBO is the name of the instance to fix)

    You should see SQL Server display a ton of messages.
    SQL Server Running

    If you get this error message:

    SQL Server installation is either corrupt or has been tampered with. Error getting instance id from name.

    Then check the instance name and try again.

  8. Now that SQL Server is running in the left window, go to the right window and start SQLCMD.  We need to use a trusted connection (-E parameter) and we need to specify the server so we can pick the right instance.  Like so:

    sqlcmd -E -sSHQBT0084\DUMBO   (where SHQBT0084\DUMBO is the server\instance to repair)

  9. The SQLCMD prompt 1> should appear:
    SQLCMD Connected
  10. Now enter the following SQL Server commands, pressing <enter> at the end of each line:

    1> sp_configure 'show advanced options', 1;
    2> go
    1> reconfigure;
    2> go
    1> sp_configure 'max server memory', 64000;
    2> go
    1> reconfigure;
    2> go

    Your screen should look like this:
    After the SQL Commands

  11. Assuming that there were no errors when you ran the reconfigure commands, you have fixed the server.  Now we need to clean up a bit.  If you did get an error, let me know via comment below.
  12. At the 1> prompt, type exit and press <enter>.  You can now close the window on the right.
  13. On the left, press Ctrl-C to stop the instance and enter Y when prompted.  You can now close the left window.
  14. Finally, restart the service in the Administrative Tools | Services window by right-clicking it and selecting Start.
  15. The End.

 

 

 

 

 

Undoing Someone Else’s Changes in TFS

Do a google search for “how to check-in another user’s changes in TFS” and you will find a couple of pages of MSDN documentation and a whole lot of forum articles questioning how to do this.  Unfortunately, there is no UI option to do this, so you must resort to the command-line.

Here’s the command syntax for the UNDO command (see MSDN for more on this):

tf undo [/workspace:workspacename[;workspaceowner]]
[/recursive] itemspec [/noprompt] [/login:username,[password]]
[/collection:TeamProjectCollectionUrl]

Too often, the TFS commands don’t come with any explicit examples, so you’re left guessing about how to implement the syntax described above.  In this article, I show you how to use the command detailed above.

Changes You Can’t Access Anymore

There are two scenarios where you can’t access the changes anymore to Undo them normally:

  1. Someone else’s changes (perhaps they have moved on)
  2. Your changes on a different computer (perhaps your old laptop)

In either case, you can tell from Source Control Explorer in Visual Studio who the user is that holds the lock, but you can’t tell the Workspace they used.  So, first, we will need to figure out which Workspace contains the changes.

Identify the Workspace that Contains the Change

To identify the Workspace, we will use the TF STATUS command.  In this command, you can specify the Folder or File(s) with the pending changes using your local Workspace, and you can specify the user who holds the lock on the files.  TF STATUS then tells you the details of the pending change(s), including the Workspace that contains the change/lock. Here is an example using the LocalFolder\* format for itemspec to find all the files Joe Wren has a lock on:

tf status "C:\_myTFS\SCMFolders\MyVSSolutionFolder1\MyVSProjectFolder1\*"
 /recursive /format:detailed /user:DOMAIN\Joe.Wren

When you execute this command from the VS2013 Command Line (Start | Programs | Visual Studio 2013 | Visual Studio Tools | Developer Command Prompt for VS2013).  You get a listing of all the files Joe Wren holds changes for, and it looks like this:

$/SCMFolders/MyVSSolutionFolder1/MyVSProjectFol
der1/Package1.dtsx;C1276
 User : Wren, Joe
 Date : Monday, May 02, 2016 2:55:53 PM
 Lock : none
 Change : edit
 Workspace : JWs-Laptop
 Local item : [JWs-Laptop] C:\_myTFS\SCMFolders\MyVSSolutionFolde
r1\MyVSProjectFolder1\Package1.dtsx
 File type : Windows-1252

$/SCMFolders/MyVSSolutionFolder1/MyVSProjectFol
der1/MyProject.dtproj;C1276
 User : Wren, Joe
 Date : Monday, May 02, 2016 2:56:41 PM
 Lock : none
 Change : edit
 Workspace : JWs-Laptop
 Local item : [JWs-Laptop] C:\_myTFS\SCMFolders\MyVSSolutionFolde
r1\MyVSProjectFolder1\MyProject.dtproj
 File type : utf-8

$/SCMFolders/MyVSSolutionFolder1/MyVSProjectFol
der1/MyProject.dtproj.vspscc;C1269
 User : Wren, Joe
 Date : Monday, May 02, 2016 2:56:41 PM
 Lock : none
 Change : edit
 Workspace : JWs-Laptop
 Local item : [JWs-Laptop] C:\_myTFS\SCMFolders\MyVSSolutionFolde
r1\MyVSProjectFolder1\MyProject.dtproj.vspscc
 File type : utf-8

3 change(s)
Notice the Workspace: JWs-Laptop line. That's the info we need.

Now Undo the Changes to that Workspace

Now that we know the Workspace (JWs-Laptop), we can undo the changes to that workspace by running the TF UNDO command (see syntax above), specifying the TFS Source Control Path for itemspec, the Workspace name and user, and the TFS Collection URL, like so:

tf undo "$/SCMFolders/MyVSSolutionFolder1/MyVSProjectFolder1/" 
/workspace:JWs-Laptop;DOMAIN\Joe.Wren /recursive 
/collection:"http://TFS-Server:8080/tfs/My%20Collection"

And that’s it.  Using these two steps you can undo any changes in TFS.*

*Note:  As long as you have the Undo Others Changes permission (which you get by being a TFS Admin or a Project Admin.

 

Creating User Stories

The 5 W’s of an Issue

When an issue gets raised in a production environment, a project planning meeting, or a requirements-gathering meeting, we ask the Who, What, When, Where, and Why of the issue to figure out how to solve it.

Typically, the Where is implied as “In the System” and the When is either “Yesterday”, “ASAP”, or “We’ll figure that out once we know the size, scope and schedule for the issue/project.”

That leaves us with the Who, What, and Why. These three questions are at the heart of a User Story.

User Stories

A User Story is a simple statement that answers the questions Who, What and Why of any given issue.

For example:

“As a night owl, I need coffee in the morning so that I can function properly.”

“As a fire station janitor, I need to get a text message whenever the truck rolls out so that I can go over to clean the station when the firefighters are not there.”

“As an accounting user, I need the system to calculate the tax for an order based on the City, County, State and Country selected in the “Bill To” address so that the tax amount is correct for each customer and to reduce billing errors.”

“As a security admin, I need to be able to lock people out of the system so that we can do an update at 1 AM Sunday morning.”

“As a manager, I need to see the Sales by Region on a map so that I can visualize the sales by region and determine which regions are most in need of sales support staff.”

In each of these examples, you can see that the User Story takes the form of a single statement that follows this template:

In short: “Who needs What and Why.”

More specifically:  “As a type of user, the system should have some feature/do some action so that some benefit will be received.”

What Not to Do

User Stories should not be ambiguous.  For example, “As a user, I need the system to report on sales, so that I can analyze sales,” is not specific at all.

Nor should they be technical: “As a Data Entry operator, when I click the Add button it should create a new CUST_TBL row and default the ADDRESS_1 field to the ADDRESS_ID of the ADDR_TBL record that has DEF_ADDRESS set to 1 so that the CUST_TBL defaults correctly.”

Nor should they omit any of the three parts of the statement.  “The system should apply shipping charges to each order so that the shipping charges are correct,” is not as useful as:  “As a customer, I need the system to show me the shipping charges that apply to my order so that I can know what they will be before I submit the order.”

It is especially tempting to leave off the third part, the “Why?” However, it is a fundamental component of the User Story.  Notice the very different implications of “As a security admin, I need to be able to lock people out of the system so that we can prevent ex-employees from logging in.” and “As a security admin, I need to be able to lock people out of the system so that we can do an update at 1 AM Sunday morning.”

Summary

User Stories answer the Who, What, and Why of an issue.

User Stories take the form: “As a type of user, the system should have some feature/do some action so that some benefit will be received .”

The 3 Things a Project Needs to Accomplish – Highest Level Requirements Document for an Agile Project

Often, in an agile software development project, there can be a “fly by the seat of your pants” feel to the development effort as stories are created, prioritized and scheduled into Sprints without a formal Requirements Phase or a 10 pound Requirements Book – especially if you are working in an almost-Agile or mostly-Agilish environment.  One way to handle the lack of a Requirements Book, is to create a “3 Things My Agile Project Needs to Accomplish” document.

The “3 Things My Agile Project Needs to Accomplish” document is lightweight, and should typically be a single page.  It simply lists the 3-5 things your project needs to accomplish, with 3-5 bulleted items below each of the things breaking the “Thing” down into high level, one sentence descriptions of functionality that will accomplish the “Thing”.

Here’s an example template:

My Agile Project

Project Purpose in 1-2 sentences.

1.    First (Most Important) Thing the Project Needs to Accomplish

  • High level functionality that will accomplish the first thing
  • More High level Functionality
  • More High level Functionality
  • Perhaps More High level Functionality
  • Perhaps More High level Functionality (Max 5 items. If you need more, then you probably need another thing to accomplish or you aren’t thinking big enough.)

2.    Second Thing the Project Needs to Accomplish

  • High level functionality that will accomplish the second thing
  • More High level Functionality
  • More High level Functionality

3.    Third (Least Important) Thing the Project Needs to Accomplish

  • High level functionality that will accomplish the second thing
  • More High level Functionality
  • More High level Functionality

 

Instructions

You may need up to 5 things the project needs to accomplish.  Any more than that and either, you aren’t thinking big enough, or the project has a very broad scope and should likely be broken down into smaller projects.

The bulleted items represent business objectives.  At this level, you should definitely be thinking in terms of “What does the solution need to do to accomplish one of the 3 things?”  “What?” implies using business objectives and business nomenclature.  We should not be concerned with “How the solution needs to work or how the solution needs to be built.”  Also, the bulleted items are not User Stories, or even Features, in and of themselves.  They are bullets of business functionality that describe “What?” is needed.

On index cards or in an ALM tool like MS TFS, each bulleted item can be translated into one or more (usually not more than 3-5) Features.  (If you need more than 5 features, you should probably go back to the 3 things and add another bullet, or even add an additional thing.)  These features should be written using business nomenclature and are still “What needs to be done?”, rather than “How?”

Each Feature, can then be elaborated into User Stories and Test Cases.  The process of making this happen usually occurs in a meeting with the developers, testers, SME’s and project/team leader (e.g. ScrumMaster or PM).

Traditionally, the User Story should be in the format:

Casual:  As a <type of user>, I want <some goal> so that <some reason>.

More Formal:  As a    Concerned Party   , I want    goal or business or technical feature or function (remember What, not How)   , so that    reason/business purpose/justification   . 

Using notecards, the User Story is written on the front of the card, and 3-5 Test Cases that will prove that the feature or function works are on the back.  The combination of the User Story and Test Cases should be sufficient to qualify as “Requirements”.  It should be possible for the team to elaborate the Tasks needed and estimate the Story Points (for the Stories) and Hours (for the Tasks).

For example, here is a User Story:

As an Analyst, I want a Widget Month Pricing table that includes one record for each Widget for each month that that Widget was on sale, including a blank Market Price field, so that I can store the Market Price generated by the Pricing Algorithm for each Widget/Month combination.

And here are some Test Cases:

  • Does the table show only relevant records based on Month and Widget?
  • Can the Pricing Algorithm, and only the Pricing Algorithm, write to the Market Price field?
  • Does the table show the right Widget/Month records for all Widgets?
  • Does the hourly refresh of the table run in under 90 seconds?

The User Story and Test Cases above give me the info I need to be able to estimate the story (in Story Points), task it out and start working on it.

A note about the Test Cases and Non-functional Requirements (NFR’s).  NFR’s, like security requirements and performance requirements, are not well suited to User Stories.  But they work very well as Test Cases.  As shown above, the User Story made no mention of the hourly refresh, but this requirement came out in the discussion of the story and was captured as a Test Case.  Also note the inclusion of the performance requirement for the hourly refresh.  Both the assumption that there would be an hourly refresh and the performance requirement that it not take more than 90 seconds are NFR’s that should be captured while the Story is being elaborated.

Orks with Dreads? Why not?

A quick and easy way to make dreadlocks for your orks, or for anyone else for that matter, is to use your Hobby Drill in the thickest part of an old sprue.  Twirling the hobby drill carefully, you can get a ½ to ¾ inch long dreadlock (or longer I suppose, if the sprue is thick enough).

Superglue the dreadlocks to your orks’ heads with Superglue Thin.  Do not use plastic glue as it will dissolve the dreadlocks you just worked so hard to make.

And voila, your bald orks now have dreads.

Enjoy your hobby!

Current bought to painted ratio for 2016:  ∞:5