Monday, June 9, 2014

More Little Lessons in Hadoop

Previously, I had posted on some common issues encountered when installing and running Hadoop for the first time.
Now, in the joyful experience of attempting a multi-node setup on a simple virtual cluster, many more errors, unexpected consequences, and curios of server-related struggle have been forthcoming.
I hope that in relaying these attempts, failures, and (admittedly sparse) triumphs through this medium, I can allay some of the angst experienced by others in the future.

Good luck, all Hadooper-troopers.

Can't yum/apt-get? 

Depending on OS version and pre-existing settings (especially when working on a stripped virtual image), your installer service may not be able to find certain packages.

For a general inability to install, try the configurations here for setting up the nginx repo. 

A particular problem I ran across was installing the repository management on CentOS, either "python-software-properties" or "software-properties-common" depending on version.
Turns out, you don't need it, contrary to the results I found on several google searches. 

Specifically, if you're following Michael Noll's tutorial, and you can't install the software properties, try doing a yum search for Java, and you'll probably get results.

However, if for some reason you don't, here are some links that may be of use:

Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

This may be due to an incorrect permissions setting, which you can solve via these directions. However, especially if you've just sourced eucarc to get going, you may want to make sure you're in your home directory...yes, I did make that mistake; it's quite frustrating until it suddenly becomes embarrassing.

For more general help on SSH through Ubuntu:

/usr/lib/jvm/java.... Not a directory

Most likely a mistyped or mixed-up Java directory, especially if you've switched Java versions recently or are setting up a new user.
Check your file, and modify the path as necessary (probably ending with the jre directory). on hadoop namenode -format

Tuesday, June 3, 2014

Lessons in Software Startups: Tackling the Problem

(This is a continuation of a series on software-related startups. You can read the first part here).

Scribbled in my notebook -- and barely legible -- are the words "have quick answers for project."

Though I don't remember it, I must have written this not even halfway through the first Software Ventures class. That day, we learned we'd all be put into teams, tasked with designing and creating a working product for our own potential startup by semester's end. It didn't take long for everyone in that class to realize what lay ahead: a fast-paced and unorthodox learning experience, seeking to mimic the brutal but exhilarating environment of software startups. By the time we got out, fourteen long weeks later, we had gained a firm mental picture of how a startup works, with all its ups, downs, potentials, and pitfalls.

And it really is necessary to expect all outcomes, positive and negative -- both occur regularly in the world of startups. Better to discover it now, experience the pain when the most it will impact is your GPA, than with a real company, when a failure could cost you your livelihood. It's easy to fear the startup process, or to glamorize it, depending on your inclination, but really neither of these approaches hold true on its own.

One startup's young CEO, speaking at our class, confessed that during the early months of his venture, he more than once lapsed into tears from the sheer stress of it all. Yet now his company is heavily funded. Another speaker told the captivating story of abandoning his successful career, secure life, and comforts of home, to pursue his passion and breath new life into his homeland -- and now his enterprise is highly successful.
In the world of startups, there is no avoiding the highs and the lows. But if you strap down tight, and if you have the right knowledge and right mindset, you can weather the storm and come through victorious: fear and overconfidence are your only true enemies here.

By the end of our first few classes, we were already "strapped down". We had seen what a startup can do to change the world, and what it can do to a person along the way. For our own fledgling projects, we knew what lay ahead of us, and we were getting geared up for our initial presentations.

This first presentation, the "pitch", is how you tell potential investors what you're all about, and convince them of your worth and potential. Constructing this pitch should also an opportunity for your team to clarify its goals, risks, and opportunities. Hopefully, you'll have a pretty good idea by now what you hope to create, but it's important to pin down a few key points very precisely:

The problem
The first and most intuitive part of your pitch is the very reason you have gathered colleagues, resources, and hope: the problem you think you can solve. Now ossify your goals and solution concepts into a tangible product or service. Be direct, exact, concrete. If you can't state the problem in a couple of sentences, you likely either don't yet have a firm grasp on what you're hoping to solve, or may be tackling something that isn't a problem in the first place.

Who are we?
It's easy to forget or ignore this point: "Who are we? We know who we are. Next step." But in some ways, the team is even more important than the product. Anyone can come up with a great idea, but it's how you pull it off that counts -- and that is reliant almost entirely on the team.
Establish an identity early. What are your strengths and weaknesses? Do your team members complement each other well? Do we need more members, or maybe fewer? And crucially, will we all stick to our goals? Forgetting this can lead to internal anarchy later, unmet deadlines, unfinished projects...failure.
In juxtaposition to this, a good team can persist well beyond the scope of the project. One of the most interesting (and harrowing) elements of a startup is how often the organization must "pivot" to stay alive. A pivot is a figurative turning-on-one's-heel, recognizing that something, something big, isn't working, and that a dramatic reassessment of goals, product features, or client base is needed. During these critical moments, a tight-knit and well-defined team can pull through.
In fact, the teacher's assistant for our Software Ventures course had this very experience. Along with his teammate, he created a software service and submitted it to a prominent startup accelerator. The project was declined on account of impracticability, but the sponsor loved the team, and was eager to see more from them. Such a good team were they that in an astoundingly short time, they had a new idea and were up, running, and funded.

Money makes the world go 'round, and chances are you're looking to make some, too. Whether you're in it for the profit or simply want to make your idea a reality, the investors and venture capitalists driving your startup will want to see results. This means having a concept of monetization from the very start. Even if it's not implemented until later, your monetization scheme will drive the form and function of your product.

Target market
You're selling to somebody. Be clear on just whom that somebody is. Even major corporations spanning broad swaths of industry began with a niche -- Amazon sold books to online clientele; IBM started life as the Tabulating Machine Company, selling early data processing equipment (IBM's full story is more complex -- read more here).
As mentioned previously, work hard to fine-tune your product (or your understanding of the problem) to fit a specific vertical, a unique market with particular needs. Choose a vertical that is too broad, and your startup, with limited funds, exposure, and resources, will be unable to cope; too narrow and you may have difficulty gaining an audience, already a difficulty for every new startup.
Defining your target market well ensures a clear path for your product.

When it comes to conceiving your software idea, the future is just as important as the present. Start small, but plan big. Once you're up and running, you'll need to scale your business model, not only adding new resources, juggling more transactions, and handling more information, but also securing a foothold in the areas you intend to expand into. The best companies start out small but have a firm action plan for many years in the future.
Such growth and scale may seem unlikely, even impossible, when your company consists of you and three friends with your laptops and one coffee machine in a rented basement, but if you're successful (and you can be!), you'll have more challenges of size to cope with than you ever thought possible. Preparing early means you'll be able to stand your ground when the administrative decisions start hitting hard and hitting often.

The well-known venture capitalist Paul Graham has said "you can use growth like a compass to make almost every decision you face." As a startup, each new challenge is the most critical one yet, and driving its resolution should be whichever choice will bring you growth. Life is all about growth, and a startup is a young, energetic creature struggling to grow as fast as you will feed it.

When we finally gave our first main presentation in class, each team was assaulted by the incisive questions asked by professor and classmates alike. We'd had so little time to think over our plans -- an intentional construct of the course -- that every new question exposed a gaping hole in our startup model, gaps we didn't even know were there, but which we rushed quickly to fill over the following days. Not one team failed to learn, the hard way, just what their plan was missing. It was rough, but it allowed us to rapidly assess the issues within our project and take countermeasures. When we arrived in class the following week, we were stronger, and ready to face the next challenge.

Thanks for reading, and please stay tuned for the next segment . If you have any questions or comments, feel free to contact me (martin at Now get out there and create something!

Monday, June 2, 2014

Little Lessons in Hadoop

Hadoop is notoriously under-documented, as I recently discovered. I am using Hadoop in my summer research position, and have launched myself into the wonderful and aggravating world of servers and open-source map-reduce programs. And one of the fun aspects of releasing open-source software, I suppose, is no one can complain if you leave it largely undocumented.

However, this does leave the experience of installing and running Hadoop as a rather harrowing experience for the uninitiated. But hands-on learning is the best way! And there are some pretty good, if often incomplete or outdated, tutorials out there, including this and this.

Those, along with a few dozen web searches, and hours of pain, struggle, and frustration, led me to the successful operation of Hadoop on the standard WordCount trial code.

I record my efforts, failures, and discoveries now for my own benefit as well as for any who might be struggling with same.

Working with Hadoop

The "No such file or directory" error.
When Hadoop is set up, and you attempt to start the instance using or, you may get the error noted above. It is likely that either your HADOOP_HOME directory is not set for the user Hadoop is running under, or mkdir failed to create the log directory due to permissions errors.
To check for the first of these cases, type "echo $HADOOP_HOME", to see if the variable is set. If you see nothing but a blank line, or get an error telling you that the directory cannot be found, you'll need to change this directory to the true Hadoop installation directory (like "/home/<user>/hadoop" or wherever you placed it). You can change this with the export command.
If HADOOP_HOME prints correctly, you will need to chmod the permissions on the Hadoop directory. Instructions on using chmod can be found here. Remember the -R flag to include subdirectories.

Contrary to what several tutorials indicate, you will likely not need to have your HADOOP_OPTS variable set -- in fact, it can be empty.
On the other hand, the HADOOP_CLASSPATH should contain the location of the hadoop/lib directory, e.g. "<user>/hadoop/lib" (use the export command for this as well).

Other small but Important Items

  • Don't forget your 'sudo'. If you're operating on files from a different user's directory (like if you're using a Hadoop-specific user but saving files on the standard user), you'll need to sudo most of your commands.
  • Likewise, chmod all the important directories before you get started.
  • The PATH environment variable must have the "bin" folder within it, e.g. "/home/<user>/hadoop/bin". You can add this with the export command (don't forget to use the  ":" concatenator to avoid overwriting existing locations).
  • When creating new directories, for input or output files, etc, use the -p flag to ignore any non-existing parent directories and create them along the way. For instance, if your <user>/Documents directory is empty, you can create the <user>/Documents/hadoop-output/wordcount-results using mkdir with a -p flag.
  • When running a program such as WordCount, you will need to handle the HDFS; if you're not sure how this is set up, you can use Hadoop's LS command to look around the same as with the equivalent command line operation: "hadoop fs -ls <directory>".
  • Attempting to test the Hadoop setup, I had difficulty ascertaining the location of the WordCount example -- every tutorial seemed to show it in a different place. As of Hadoop  2.3.0, the jar with this example is in "<main Hadoop directory>/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar".
  • To save some typing, of which you will be doing plenty, consider using aliases on the more common commands. For instance, you might use "h-start' as an alias for "<main Hadoop directory>/bin/". You can learn about aliases here.

Good luck with your Hadooping! I will add more hints and tips as I encounter them.

Saturday, May 17, 2014

Lessons in Software Startups: Initiation

"Don't sit in the same spot twice."

This, the first line in my notebook. This, the first lesson -- suggestion? command? -- of our exclusive new class, "Software Ventures." This, a strange new experience in a university where lectures are de rigueur, comprehension often requires several trips to the TA, and "hands-on learning" is most commonly found in the oft-scorned classrooms of the students of arts and humanities.

So why are we sitting here, nary a notebook in sight, having a relaxed conversation with our new professor (rare!), and starting off the semester by discussing the dynamics of our seating arrangement?

Because software startups, we will find out first-hand, are a dynamic and unstable environment at best, and any portending acolyte of this exhilarating career must be prepared to think up a new idea, new approach, or entirely new set of goals at the drop of a hat.

This is the first taste we students got of our fifteen-week journey into understanding, conceptualizing, and creating our own unique software ventures. Get ready to have a blast (and suffer).

The love-hate relationship I quickly developed with this class notwithstanding, the one clear takeaway from the first minutes of our class was that the lessons would be hugely valuable. That very first day, we hit upon so many crucial points:

Make your business model scalable and repeatable.
Startups are small. Corporations are not. And if you plan to be anything beyond mildly successful with your startup (and really, there is no such thing as "mildly successful" in this field -- it's live or die), then you should be prepared to grow. If your idea isn't flexible enough to expand into a broader and larger environment, your business will burn out before it even begins to saturate the market.
It may seem completely pointless, but from the very beginning, focusing on the future will help you understand the growth of your own business.
Your company should operate as easily with ten employees as with ten thousand -- you never know when a surge in business will require hundreds of new hires to cope with new work, or when a fall in the market will necessitate multiple layoffs. New customers should be added cleanly and accommodated, be they the only new clients this month or merely the first of hundreds in one day. Knowing how to grow is as crucial as knowing your business itself.

"Product Market Fit"
Broad thinking is fantastic. If you envision a future filled with flying cars and regular traffic to Mars, more power to you, but your pitch "Software Interfaces for Piloting Intergalactic Vessels" probably won't raise much capital, no matter how impassioned your presentation.
Think hard about your product. How does it fit into a known segment of the market? Answering this will make sure you're on target, at least when you start out.

Bootstrap to build something cheap, and build it fast.
The old advice "lift yourself up by your own bootstraps" may seem as unhelpful as someone telling you to "man up and get back to work" when you're struggling with depression, but when you're developing a startup, that's exactly what you need to do. Work hard, and work fast; the sooner (and the cheaper) you get a working product, the better you'll be able to get your idea off the ground -- and maybe even have resources left over to, oh, say, run your company.

Find out what your user wants before you start.
I get it, you're smart. You've got a great idea for a new piece of software. Everyone's going to use it. Right? Maybe not. Your targeted users have an uncanny way of despising the very features you think are beautiful. It may be you know people well, and your idea is exactly what people want. But you don't know that until you confirm that, and as usual, better to find out now then after you've sunk thousands of dollars from your investors into your false start.

Find your "Vertical".
Just like the broad-minded approach to your product can get you in trouble, trying to sell to everyone all at once can kill you. Look to find a specific "vertical", a small part of the market which your product fits into well, like shipping specialized items to gourmet restaurants if you've developed a radical new way to efficiently organize delivery networks. Don't lose the big picture, but start small.

These are just some of the big ideas we picked up after the first wonderful, exhausting, frustrating three-hour session of "Software Ventures". It was a great way to kick off the semester, and everyone there -- our professor included -- knew that this was going to be a growing experience like none we'd had before.

Thanks for reading, and please stay tuned for the next segment on lessons learned in software startups. If you have any questions or comments, feel free to contact me (martin at And remember -- don't sit in the same spot twice!

Wednesday, May 14, 2014

Confirmed: "Computers are Fast" (!)

After reading this post by Julia Evans, which considers CPU speeds somewhat more deeply than its title implies ("Computers are Fast"), a fragment from a recent conversation with one of my computer science professors came to mind.

Simply, and somewhat paraphrased: "almost all processes are rapidly becoming I/O-bound."

Not so long ago, in OS Design class, one homework and several questions on exams tasked us to carefully identify whether a process would be I/O-bound or CPU-bound based on its actions and properties. Would "I/O-bound" have consistently been the correct answer?
Not according to the professor of that class, at least, since I remember a few answers to the contrary. And I'd be willing to wager that there remain enough computationally-intensive tasks that OSs must take CPU-bound costs into consideration when scheduling processes, at least in some areas of work.
But might gains in speed, parallelism, and optimization eventually sway the balance?

My guess is yes -- but only for personal-computing tasks. For example, I've never personally run a highly complex physics particle simulator on a time-slotted supercomputer, but I'd bet most of that isn't too memory-heavy, especially compared to the insane number of calculations required (interesting note about reducing calculation cost).
And imagine how that is for some higher-order function, like prime factorization (well, I guess it's not officially known to be superpolynomial at this point). The time required to compute can be enormous, but space complexity doesn't need to be too bad (they're just integers, after all).

I'm curious to see how things turn out over the next few years. It's an exciting time to be computing! -- and when isn't it?

Monday, May 12, 2014

A Remarkable Exercise in Programming Prowess

A remarkable exercise: reducing a programming language (Ruby, in this case) to its bare minimum while preserving its expressiveness and power, and demonstrating how that functionality can be derived from lambda functions.

Check it out here, at Programming with Nothing.

Could we see the same thing done in other languages? I imagine it is easier in some and much more difficult in others. For instance, Java is an intuitive language, but can its lambdas, added recently, stand up to the test? Most Java developers rely so heavily on imported libraries (understandably) that its hard to imagine getting much done without them.

Has this already been done? Or are repetitions of this experiment just dying to be performed on new languages? It's exciting....

Friday, April 18, 2014