Connect Excel to HDInsight with Power Query
One key feature of Microsoft's big data solution is the integration of Microsoft Business Intelligence (BI) components with HDInsight Hadoop clusters. A primary example of this integration is the ability to connect Excel to the Windows Azure storage account containing the data associated with your HDInsight cluster by using Microsoft Power Query for Excel. This article walks you through how to set up and use Power Query from Excel to query data associated with an HDInsight cluster.
Before you begin this article, you must have the following:
- A HDInsight cluster. To configure one, see Get started with Windows Azure HDInsight.
- A computer that is running Windows 8, Windows 7, Windows Server 2012, or Windows Server 2008 R2.
- Office 2013 Professional Plus, Office 365 Pro Plus, Excel 2013 Standalone, or Office 2010 Professional Plus.
In this article
Install Microsoft Power Query for Excel
Power Query can be used to import data from a variety of sources into Microsoft Excel, where it can power Business Intelligence (BI) tools like PowerPivot and Power View. In particular, Power Query can import data that has been output or that has been generated by a Hadoop job running on an HDInsight cluster.
Download Microsoft Power Query for Excel from the Microsoft Download Center and install it.
Import HDInsight data into Excel
The Power Query add-in for Excel makes it easy to import data from your HDInsight cluster into Excel where business intelligence tools such as PowerPivot and Power Map may be used to inspect, analyze, and present the data.
To import data from an HDInsight cluster
Create a new blank workbook.
Click the Power Query menu, click From Other Sources, and then click From Windows Azure HDInsight.
Note: If you don't see the Power Query menu, go to File > Options > Add-Ins, and select COM Addins from the drop-down Manager box at the bottom of the page. Select the Go... button and verify that the box for the Microsoft Office Power Query for Excel Add-In has been checked.
Enter the Account Name of the Windows Azure Blob storage account associated with your cluster, and then click OK.
Enter the Account Key for the Blob storage account, and then click Save. (You need to do this only the first time you access this store.)
In the Navigator pane on the left of the Query Editor, double-click the Blob storage container name. By default the container name is the same name as the cluster name.
Locate HiveSampleData.txt in the Name column (the folder path is ../hive/warehouse/hivesampletable/), and then click Binary on the left of HiveSampleData.txt.
If you want, you can rename the column names. When you are ready, click Apply & Close.
In this article you learned how to use Power Query to retrieve data from HDInsight into Excel. Similarly, you can retrieve data from HDInsight into SQL Azure. It is also possible to upload data into HDInsight. To learn more, see the following articles: