Hadoop Follow Up – Hortonworks HDP Sandbox

The Hortonworks Hadoop Sandbox download got corrupted the first time.  It worked fine the second time.


I installed Oracle VirtualBox first.  Then, in the Oracle VM VirtualBox Manager, I select the File | Import Appliance… option, selected the HDP_2.4_virtualbox_v3.ova file and clicked Next and Import.
Importing the HDP Appliance

A few seconds later, the box was installed, so I started it up.  After  loading and starting a ton of stuff, it seemed to stop doing things and the screen looked like this:
HDP Appliance Screen

Connecting to the VM

I dismissed the two messages at the top and tried a zillion things to figure out what to do next.  Nothing.  Then I read something in the Hortonworks Tutorial in the Hello World section of the Hortonworks tutorial site about the box’s address and how to connect to the Welcome Screen.  No wonder I couldn’t do anything inside the VM itself, the interface is web-based and uses the URL:  Entering that URL into my browser, I connected and saw this:
HDP Welcome Screen

Then I ran into difficulty because the firewall at work won’t let me download the tutorial files.  Ack!

My First Foray into Hadoop

So I have a big dataset (1.7 billion rows) that I want to analyze.  I figured, “Hey, Hadoop is all over this Big Data thing, I wonder if I can do a Proof of Concept?”

Compiling Hadoop on Windows (Ugh!)

So, first, I tried to follow some instructions on how to get the Hadoop source into Windows and compile it.  It turns out that Hadoop is Jave based and most Hadoop programmers are Java programmers.  So a lot of the instructions are in Java.  And, good for me, the build engine is Maven, which I happen to know quite a bit about thanks to the weeks at CompanionCabinet where I automated the build using Maven.

However, it turned out the Ant was having a problem with running the SH command and after several tries, I went googling for an already compiled version of the Hadoop project.  Low and behold, I found one on GitHub:  https://github.com/karthikj1/Hadoop-2.7.1-Windows-64-binaries.  In the middle of the top area of the page is a “1 Release” link.  Click there to download the binary:

Hadoop Binary

Installing all the bits

Based on the wiki article here:  http://wiki.apache.org/hadoop/Hadoop2OnWindows.

I found the link to this:  Building.txt

Near the bottom of that file, are some incomplete instructions on what to download, install and do to compile your own version of Hadoop in Windows.

So I downloaded all these:

  1. Java Developers Kit (JDK) 1.7.0_80, includes Java Runtime Environment (JRE) 7.
    JDK Download
  2. Maven 3.3.9.
  3. Cygwin 64.
  4. CMake 3.5.2.
  5. zlib 128.
  6. protobuf 2.5.0.
  7. Windows 7 SDK.

Then I installed or unzipped the files.

  1. JDK 1.7 is an install.  I let it install to Program Files\Java.
  2. I copied the Maven file to the Java folder and unzipped it to a new folder (apache-maven-3.3.9).
  3. I installed Cygwin to the Program Files\Java\Cygwin folder.
  4. I installed CMake and accepted the defaults.
  5. I unzipped the zlib 128 files to Program Files\Java\zlib128-dll.
  6. I unzipped the protobuf files to Program Files\Java\protobuf-2.5.0.
  7. I tried to install the Windows 7 SDK but it had issues, which I ignored and proceeded on since I wasn’t going to compile my own Hadoop after all.
  8. I unzipped the Hadoop files to \hadoop-2.7.1.

Then I did the following steps:

  1. JAVA_HOME must be set, and the path must not contain spaces. If the full path would contain spaces, then use the Windows short path instead.  In my case, this was:
    set JAVA_HOME=C:\Progra~1\Java\jdk1.7.0_80\
  2. I created a C:\tmp folder because I didn’t have one and, by convention, Hadoop uses it.
  3. I added the ZLIB_HOME environment variable and pointed it to C:\Program Files\Java\zlib128-dll\include.
  4. I added several items to the PATH variable:  C:\Program Files\Java\apache-maven-3.3.9\bin;C:\Program Files (x86)\CMake\bin;C:\Program Files\Java\zlib128-dll;C:\Program Files\Java\Cygwin\bin

With all that in place, I was ready to start Hadoop.

Starting Hadoop

Apparently I have to configure several files in the Hadoop\etc\configure folder first.

Section 3 on the wiki page describes in detail how to change the configuration files.

I combined that information with the steps found on this article to create the file system, create a directory and put my txt file there.

What’s Next?

I am not sure what’s next.  Looks like I have some learning to do.

This article gives a nice technical overview of Hadoop.

And then I discovered Hortonworks.  Hortonworks Sandbox is an open-source VM with Hadoop and a bunch of tools already fully configured.  So I downloaded this onto a different machine and am trying it out right now.  I’m going to try the VirtualBox VM.  I used VMWare Player and VirtualBox some time ago and found VirtualBox a lot easier to work with.  It looks the Hortonworks HDP Sandbox is going to take a while to download.  See you again on Monday.

In the meantime, I’m going to check out this tutorial on edureka.


Oops…I Broke the SQL Server

So this happened.  In an attempt to give my SQL Server Instance access to more memory, I set the Max Memory to zero, expecting that to mean infinite.  No luck, SQL Server Management Studio (SSMS) set the max memory to 16MB instead and broke the instance.  I could not do anything more in SSMS because the instance did not have enough memory.

Setting the Max Memory too low

Note:  The screenshots in this article are from a SQL Server 2014 instance, which has apparently fixed this problem so that the Max Server Memory setting defaults to 128MB when you set it to zero and you can still connect with SSMS at 128MB.  In 2012 and prior versions, the setting defaults to 16MB, which is what causes all the trouble.

So I googled for: “accidentally set max server memory to 0”.  This turned up a ton of useful links, but since I had to piece my solution together from various posts, I have created this blog entry to hopefully help someone else get to the solution more quickly.

How to Increase SQL Server Max Memory in a Named Instance

  1. First, you will need to be an administrator on the SQL Server Instance you want to fix.
  2. Now, from the Start | Run menu in Win 7, or Start | Search in Win 10, look for CMD.
    Start Run CMD   Start Search CMD
  3. Now RIGHT-Click on cmd.exe or Command Prompt and select Run As Administrator.
  4. Repeat steps 2 and 3, so you have two command windows open. Like so:
    Two Command Windows
    In the left window, we will start the instance we need to repair.  In the right window, we will connect to that instance with SQLCMD and fix the memory setting.
  5. In the both windows, you need to change to the Binn directory of the SQL Instance that you want to repair.  In my case, this instance is on the D: drive so I have to switch to that first.  Also, this instance is found in the D:\Program Files\Microsoft SQL Server\MSSQL12.DUMBO folder.  Lastly, the Binn folder is below the instance atInstance\MSSQL\Binn path.  So I enter these two commands:

    D: <enter>


    cd D:\Program Files\Microsoft SQL Server\MSSQL12.DUMBO\MSSQL\Binn <enter>

    Change Directory

  6. Now that I am in the Binn folder, I can start the SQL Server Instance.  Note:  This assumes the instance is stopped.  Go to Start | Control Panel | Administrative Tools | Services and find the SQL Server (InstanceName) Service and make sure the Status column is blank.  If it says ‘Started’, then right-click it and Stop the service.
    Administrative Tools
  7. So, back to the command window.  On the left side, we need to start the SQLSERVR service in single user mode, being sure to name the correct instance, like so:

    sqlservr -f -sDUMBO (where DUMBO is the name of the instance to fix)

    You should see SQL Server display a ton of messages.
    SQL Server Running

    If you get this error message:

    SQL Server installation is either corrupt or has been tampered with. Error getting instance id from name.

    Then check the instance name and try again.

  8. Now that SQL Server is running in the left window, go to the right window and start SQLCMD.  We need to use a trusted connection (-E parameter) and we need to specify the server so we can pick the right instance.  Like so:

    sqlcmd -E -sSHQBT0084\DUMBO   (where SHQBT0084\DUMBO is the server\instance to repair)

  9. The SQLCMD prompt 1> should appear:
    SQLCMD Connected
  10. Now enter the following SQL Server commands, pressing <enter> at the end of each line:

    1> sp_configure 'show advanced options', 1;
    2> go
    1> reconfigure;
    2> go
    1> sp_configure 'max server memory', 64000;
    2> go
    1> reconfigure;
    2> go

    Your screen should look like this:
    After the SQL Commands

  11. Assuming that there were no errors when you ran the reconfigure commands, you have fixed the server.  Now we need to clean up a bit.  If you did get an error, let me know via comment below.
  12. At the 1> prompt, type exit and press <enter>.  You can now close the window on the right.
  13. On the left, press Ctrl-C to stop the instance and enter Y when prompted.  You can now close the left window.
  14. Finally, restart the service in the Administrative Tools | Services window by right-clicking it and selecting Start.
  15. The End.






Undoing Someone Else’s Changes in TFS

Do a google search for “how to check-in another user’s changes in TFS” and you will find a couple of pages of MSDN documentation and a whole lot of forum articles questioning how to do this.  Unfortunately, there is no UI option to do this, so you must resort to the command-line.

Here’s the command syntax for the UNDO command (see MSDN for more on this):

tf undo [/workspace:workspacename[;workspaceowner]]
[/recursive] itemspec [/noprompt] [/login:username,[password]]

Too often, the TFS commands don’t come with any explicit examples, so you’re left guessing about how to implement the syntax described above.  In this article, I show you how to use the command detailed above.

Changes You Can’t Access Anymore

There are two scenarios where you can’t access the changes anymore to Undo them normally:

  1. Someone else’s changes (perhaps they have moved on)
  2. Your changes on a different computer (perhaps your old laptop)

In either case, you can tell from Source Control Explorer in Visual Studio who the user is that holds the lock, but you can’t tell the Workspace they used.  So, first, we will need to figure out which Workspace contains the changes.

Identify the Workspace that Contains the Change

To identify the Workspace, we will use the TF STATUS command.  In this command, you can specify the Folder or File(s) with the pending changes using your local Workspace, and you can specify the user who holds the lock on the files.  TF STATUS then tells you the details of the pending change(s), including the Workspace that contains the change/lock. Here is an example using the LocalFolder\* format for itemspec to find all the files Joe Wren has a lock on:

tf status "C:\_myTFS\SCMFolders\MyVSSolutionFolder1\MyVSProjectFolder1\*"
 /recursive /format:detailed /user:DOMAIN\Joe.Wren

When you execute this command from the VS2013 Command Line (Start | Programs | Visual Studio 2013 | Visual Studio Tools | Developer Command Prompt for VS2013).  You get a listing of all the files Joe Wren holds changes for, and it looks like this:

 User : Wren, Joe
 Date : Monday, May 02, 2016 2:55:53 PM
 Lock : none
 Change : edit
 Workspace : JWs-Laptop
 Local item : [JWs-Laptop] C:\_myTFS\SCMFolders\MyVSSolutionFolde
 File type : Windows-1252

 User : Wren, Joe
 Date : Monday, May 02, 2016 2:56:41 PM
 Lock : none
 Change : edit
 Workspace : JWs-Laptop
 Local item : [JWs-Laptop] C:\_myTFS\SCMFolders\MyVSSolutionFolde
 File type : utf-8

 User : Wren, Joe
 Date : Monday, May 02, 2016 2:56:41 PM
 Lock : none
 Change : edit
 Workspace : JWs-Laptop
 Local item : [JWs-Laptop] C:\_myTFS\SCMFolders\MyVSSolutionFolde
 File type : utf-8

3 change(s)
Notice the Workspace: JWs-Laptop line. That's the info we need.

Now Undo the Changes to that Workspace

Now that we know the Workspace (JWs-Laptop), we can undo the changes to that workspace by running the TF UNDO command (see syntax above), specifying the TFS Source Control Path for itemspec, the Workspace name and user, and the TFS Collection URL, like so:

tf undo "$/SCMFolders/MyVSSolutionFolder1/MyVSProjectFolder1/" 
/workspace:JWs-Laptop;DOMAIN\Joe.Wren /recursive 

And that’s it.  Using these two steps you can undo any changes in TFS.*

*Note:  As long as you have the Undo Others Changes permission (which you get by being a TFS Admin or a Project Admin.


Creating User Stories

The 5 W’s of an Issue

When an issue gets raised in a production environment, a project planning meeting, or a requirements-gathering meeting, we ask the Who, What, When, Where, and Why of the issue to figure out how to solve it.

Typically, the Where is implied as “In the System” and the When is either “Yesterday”, “ASAP”, or “We’ll figure that out once we know the size, scope and schedule for the issue/project.”

That leaves us with the Who, What, and Why. These three questions are at the heart of a User Story.

User Stories

A User Story is a simple statement that answers the questions Who, What and Why of any given issue.

For example:

“As a night owl, I need coffee in the morning so that I can function properly.”

“As a fire station janitor, I need to get a text message whenever the truck rolls out so that I can go over to clean the station when the firefighters are not there.”

“As an accounting user, I need the system to calculate the tax for an order based on the City, County, State and Country selected in the “Bill To” address so that the tax amount is correct for each customer and to reduce billing errors.”

“As a security admin, I need to be able to lock people out of the system so that we can do an update at 1 AM Sunday morning.”

“As a manager, I need to see the Sales by Region on a map so that I can visualize the sales by region and determine which regions are most in need of sales support staff.”

In each of these examples, you can see that the User Story takes the form of a single statement that follows this template:

In short: “Who needs What and Why.”

More specifically:  “As a type of user, the system should have some feature/do some action so that some benefit will be received.”

What Not to Do

User Stories should not be ambiguous.  For example, “As a user, I need the system to report on sales, so that I can analyze sales,” is not specific at all.

Nor should they be technical: “As a Data Entry operator, when I click the Add button it should create a new CUST_TBL row and default the ADDRESS_1 field to the ADDRESS_ID of the ADDR_TBL record that has DEF_ADDRESS set to 1 so that the CUST_TBL defaults correctly.”

Nor should they omit any of the three parts of the statement.  “The system should apply shipping charges to each order so that the shipping charges are correct,” is not as useful as:  “As a customer, I need the system to show me the shipping charges that apply to my order so that I can know what they will be before I submit the order.”

It is especially tempting to leave off the third part, the “Why?” However, it is a fundamental component of the User Story.  Notice the very different implications of “As a security admin, I need to be able to lock people out of the system so that we can prevent ex-employees from logging in.” and “As a security admin, I need to be able to lock people out of the system so that we can do an update at 1 AM Sunday morning.”


User Stories answer the Who, What, and Why of an issue.

User Stories take the form: “As a type of user, the system should have some feature/do some action so that some benefit will be received .”