Run the HDInsight samples
A set of samples are provided to help you get started with Windows Azure HDInsight. These samples are made available on each of the HDInsight clusters that you create. Running these samples will familiarize you with Windows Azure PowerShell HDInsight cmdlets.
MapReduce programs can also be run programmatically from an application using the Microsoft .NET API for HDInsight. For more information on using the HDInsight APIs for job submission, see Submit Hadoop Jobs Programmatically.
Much additional documentation exists on the web for Hadoop-related technologies such as Java-based MapReduce programming and streaming, as well as documentation on the cmdlets using in PowerShell scripting. For more information on these resources, see the final Resources for HDInsight section of the Introduction to Windows Azure HDInsight topic.
What these samples are
These samples are intended to get you up to speed quickly on how to deploy Hadoop jobs and to provide you an extensible testing bed to work with the concepts and scripting procedures used by the service. They provide you with examples of common tasks such as creating and importing data sets of various sizes, running jobs and composing jobs sequentially, and examining the results of your jobs. The data sets used can be varied in size, allowing you to observe the effects that data sets of various size has on job performance.
HDInsight ships with the following samples.
- The Pi Estimator Sample This tutorial shows how to run a MapReduce program with HDInsight that uses a statistical (quasi-Monte Carlo) method to estimate the value of Pi.
- The WordCount Sample This tutorial shows how to use an HDInsight cluster to run a MapReduce program that counts word occurrences in a text file.
- The 10-GB Graysort Sample This tutorial shows how to run a general purpose GraySort on a 10 GB file using HDInsight. There are three jobs to run: Teragen to generate the data, Terasort to sort the data, and Teravalidate to confirm the data has been properly sorted.
- The C# Streaming Sample This tutorial shows how to use C# to write a MapReduce program that uses the Hadoop streaming interface.
How to run the samples
The samples can be run using Windows Azure PowerShell. Instructions on how to do this are provided for each of the samples on the pages linked above.
From this article and the articles on each of the samples, you learned how to run the samples included with the HDInsight clusters using Windows Azure PowerShell. For tutorials on using Pig, Hive, and MapReduce with HDInsight, see the following topics: