Monday, June 9, 2014

More Little Lessons in Hadoop

Previously, I had posted on some common issues encountered when installing and running Hadoop for the first time.
Now, in the joyful experience of attempting a multi-node setup on a simple virtual cluster, many more errors, unexpected consequences, and curios of server-related struggle have been forthcoming.
I hope that in relaying these attempts, failures, and (admittedly sparse) triumphs through this medium, I can allay some of the angst experienced by others in the future.

Good luck, all Hadooper-troopers.

Can't yum/apt-get? 

Depending on OS version and pre-existing settings (especially when working on a stripped virtual image), your installer service may not be able to find certain packages.

For a general inability to install, try the configurations here for setting up the nginx repo. 

A particular problem I ran across was installing the repository management on CentOS, either "python-software-properties" or "software-properties-common" depending on version.
Turns out, you don't need it, contrary to the results I found on several google searches. 

Specifically, if you're following Michael Noll's tutorial, and you can't install the software properties, try doing a yum search for Java, and you'll probably get results.

However, if for some reason you don't, here are some links that may be of use:

Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This may be due to an incorrect permissions setting, which you can solve via these directions. However, especially if you've just sourced eucarc to get going, you may want to make sure you're in your home directory...yes, I did make that mistake; it's quite frustrating until it suddenly becomes embarrassing.

For more general help on SSH through Ubuntu:

/usr/lib/jvm/java.... Not a directory

Most likely a mistyped or mixed-up Java directory, especially if you've switched Java versions recently or are setting up a new user.
Check your file, and modify the path as necessary (probably ending with the jre directory). on hadoop namenode -format

Tuesday, June 3, 2014

Lessons in Software Startups: Tackling the Problem

(This is a continuation of a series on software-related startups. You can read the first part here).

Scribbled in my notebook -- and barely legible -- are the words "have quick answers for project."

Though I don't remember it, I must have written this not even halfway through the first Software Ventures class. That day, we learned we'd all be put into teams, tasked with designing and creating a working product for our own potential startup by semester's end. It didn't take long for everyone in that class to realize what lay ahead: a fast-paced and unorthodox learning experience, seeking to mimic the brutal but exhilarating environment of software startups. By the time we got out, fourteen long weeks later, we had gained a firm mental picture of how a startup works, with all its ups, downs, potentials, and pitfalls.

And it really is necessary to expect all outcomes, positive and negative -- both occur regularly in the world of startups. Better to discover it now, experience the pain when the most it will impact is your GPA, than with a real company, when a failure could cost you your livelihood. It's easy to fear the startup process, or to glamorize it, depending on your inclination, but really neither of these approaches hold true on its own.

One startup's young CEO, speaking at our class, confessed that during the early months of his venture, he more than once lapsed into tears from the sheer stress of it all. Yet now his company is heavily funded. Another speaker told the captivating story of abandoning his successful career, secure life, and comforts of home, to pursue his passion and breath new life into his homeland -- and now his enterprise is highly successful.
In the world of startups, there is no avoiding the highs and the lows. But if you strap down tight, and if you have the right knowledge and right mindset, you can weather the storm and come through victorious: fear and overconfidence are your only true enemies here.

By the end of our first few classes, we were already "strapped down". We had seen what a startup can do to change the world, and what it can do to a person along the way. For our own fledgling projects, we knew what lay ahead of us, and we were getting geared up for our initial presentations.

This first presentation, the "pitch", is how you tell potential investors what you're all about, and convince them of your worth and potential. Constructing this pitch should also an opportunity for your team to clarify its goals, risks, and opportunities. Hopefully, you'll have a pretty good idea by now what you hope to create, but it's important to pin down a few key points very precisely:

The problem
The first and most intuitive part of your pitch is the very reason you have gathered colleagues, resources, and hope: the problem you think you can solve. Now ossify your goals and solution concepts into a tangible product or service. Be direct, exact, concrete. If you can't state the problem in a couple of sentences, you likely either don't yet have a firm grasp on what you're hoping to solve, or may be tackling something that isn't a problem in the first place.

Who are we?
It's easy to forget or ignore this point: "Who are we? We know who we are. Next step." But in some ways, the team is even more important than the product. Anyone can come up with a great idea, but it's how you pull it off that counts -- and that is reliant almost entirely on the team.
Establish an identity early. What are your strengths and weaknesses? Do your team members complement each other well? Do we need more members, or maybe fewer? And crucially, will we all stick to our goals? Forgetting this can lead to internal anarchy later, unmet deadlines, unfinished projects...failure.
In juxtaposition to this, a good team can persist well beyond the scope of the project. One of the most interesting (and harrowing) elements of a startup is how often the organization must "pivot" to stay alive. A pivot is a figurative turning-on-one's-heel, recognizing that something, something big, isn't working, and that a dramatic reassessment of goals, product features, or client base is needed. During these critical moments, a tight-knit and well-defined team can pull through.
In fact, the teacher's assistant for our Software Ventures course had this very experience. Along with his teammate, he created a software service and submitted it to a prominent startup accelerator. The project was declined on account of impracticability, but the sponsor loved the team, and was eager to see more from them. Such a good team were they that in an astoundingly short time, they had a new idea and were up, running, and funded.

Money makes the world go 'round, and chances are you're looking to make some, too. Whether you're in it for the profit or simply want to make your idea a reality, the investors and venture capitalists driving your startup will want to see results. This means having a concept of monetization from the very start. Even if it's not implemented until later, your monetization scheme will drive the form and function of your product.

Target market
You're selling to somebody. Be clear on just whom that somebody is. Even major corporations spanning broad swaths of industry began with a niche -- Amazon sold books to online clientele; IBM started life as the Tabulating Machine Company, selling early data processing equipment (IBM's full story is more complex -- read more here).
As mentioned previously, work hard to fine-tune your product (or your understanding of the problem) to fit a specific vertical, a unique market with particular needs. Choose a vertical that is too broad, and your startup, with limited funds, exposure, and resources, will be unable to cope; too narrow and you may have difficulty gaining an audience, already a difficulty for every new startup.
Defining your target market well ensures a clear path for your product.

When it comes to conceiving your software idea, the future is just as important as the present. Start small, but plan big. Once you're up and running, you'll need to scale your business model, not only adding new resources, juggling more transactions, and handling more information, but also securing a foothold in the areas you intend to expand into. The best companies start out small but have a firm action plan for many years in the future.
Such growth and scale may seem unlikely, even impossible, when your company consists of you and three friends with your laptops and one coffee machine in a rented basement, but if you're successful (and you can be!), you'll have more challenges of size to cope with than you ever thought possible. Preparing early means you'll be able to stand your ground when the administrative decisions start hitting hard and hitting often.

The well-known venture capitalist Paul Graham has said "you can use growth like a compass to make almost every decision you face." As a startup, each new challenge is the most critical one yet, and driving its resolution should be whichever choice will bring you growth. Life is all about growth, and a startup is a young, energetic creature struggling to grow as fast as you will feed it.

When we finally gave our first main presentation in class, each team was assaulted by the incisive questions asked by professor and classmates alike. We'd had so little time to think over our plans -- an intentional construct of the course -- that every new question exposed a gaping hole in our startup model, gaps we didn't even know were there, but which we rushed quickly to fill over the following days. Not one team failed to learn, the hard way, just what their plan was missing. It was rough, but it allowed us to rapidly assess the issues within our project and take countermeasures. When we arrived in class the following week, we were stronger, and ready to face the next challenge.

Thanks for reading, and please stay tuned for the next segment . If you have any questions or comments, feel free to contact me (martin at Now get out there and create something!

Monday, June 2, 2014

Little Lessons in Hadoop

Hadoop is notoriously under-documented, as I recently discovered. I am using Hadoop in my summer research position, and have launched myself into the wonderful and aggravating world of servers and open-source map-reduce programs. And one of the fun aspects of releasing open-source software, I suppose, is no one can complain if you leave it largely undocumented.

However, this does leave the experience of installing and running Hadoop as a rather harrowing experience for the uninitiated. But hands-on learning is the best way! And there are some pretty good, if often incomplete or outdated, tutorials out there, including this and this.

Those, along with a few dozen web searches, and hours of pain, struggle, and frustration, led me to the successful operation of Hadoop on the standard WordCount trial code.

I record my efforts, failures, and discoveries now for my own benefit as well as for any who might be struggling with same.

Working with Hadoop

The "No such file or directory" error.
When Hadoop is set up, and you attempt to start the instance using or, you may get the error noted above. It is likely that either your HADOOP_HOME directory is not set for the user Hadoop is running under, or mkdir failed to create the log directory due to permissions errors.
To check for the first of these cases, type "echo $HADOOP_HOME", to see if the variable is set. If you see nothing but a blank line, or get an error telling you that the directory cannot be found, you'll need to change this directory to the true Hadoop installation directory (like "/home/<user>/hadoop" or wherever you placed it). You can change this with the export command.
If HADOOP_HOME prints correctly, you will need to chmod the permissions on the Hadoop directory. Instructions on using chmod can be found here. Remember the -R flag to include subdirectories.

Contrary to what several tutorials indicate, you will likely not need to have your HADOOP_OPTS variable set -- in fact, it can be empty.
On the other hand, the HADOOP_CLASSPATH should contain the location of the hadoop/lib directory, e.g. "<user>/hadoop/lib" (use the export command for this as well).

Other small but Important Items

  • Don't forget your 'sudo'. If you're operating on files from a different user's directory (like if you're using a Hadoop-specific user but saving files on the standard user), you'll need to sudo most of your commands.
  • Likewise, chmod all the important directories before you get started.
  • The PATH environment variable must have the "bin" folder within it, e.g. "/home/<user>/hadoop/bin". You can add this with the export command (don't forget to use the  ":" concatenator to avoid overwriting existing locations).
  • When creating new directories, for input or output files, etc, use the -p flag to ignore any non-existing parent directories and create them along the way. For instance, if your <user>/Documents directory is empty, you can create the <user>/Documents/hadoop-output/wordcount-results using mkdir with a -p flag.
  • When running a program such as WordCount, you will need to handle the HDFS; if you're not sure how this is set up, you can use Hadoop's LS command to look around the same as with the equivalent command line operation: "hadoop fs -ls <directory>".
  • Attempting to test the Hadoop setup, I had difficulty ascertaining the location of the WordCount example -- every tutorial seemed to show it in a different place. As of Hadoop  2.3.0, the jar with this example is in "<main Hadoop directory>/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar".
  • To save some typing, of which you will be doing plenty, consider using aliases on the more common commands. For instance, you might use "h-start' as an alias for "<main Hadoop directory>/bin/". You can learn about aliases here.

Good luck with your Hadooping! I will add more hints and tips as I encounter them.