Connect Excel to Windows Azure HDInsight with Power Query
One key feature of Microsoft’s Big Data Solution is the integration of Microsoft Business Intelligence (BI) components with HDInsight Hadoop clusters. A primary example of this integration is the ability to connect Excel to the Windows Azure storage account that stores the data associated with your HDInsight cluster by using Microsoft Power Query for Excel. This article walks you through how to set up and use Power Query from Excel to query data associated with an HDInsight cluster.
Before you begin this article, you must have the following:
- A Windows Azure subscription. For more information about obtaining a subscription, see Purchase Options, Member Offers, or Free Trial.
- A HDInsight cluster. To configure one, see Getting Started with Windows Azure HDInsight Service.
- A computer that is running Windows 8, Windows 7, Windows Server 2012, or Windows Server 2008 R2.
- Office 2013 Professional Plus, Office 365 Pro Plus, Excel 2013 Standalone, or Office 2010 Professional Plus.
In this article
Install Microsoft Power Query for Excel
Power Query can be used to import data from a variety of sources into Microsoft Excel, where it can be used to power Business Intelligence (BI) tools like PowerPivot and PowerView. In particular, Power Query can import data that has been output or that has been generated by a Hadoop Job running on an HDInsight cluster.
To install Microsoft Power Query for Excel
Import HDInsight data into Excel
The Power Query add-in for Excel makes it easy to import data from your HDInsight Service cluster into Excel where Business Intelligence tools such as Power Pivot and Power Map may be used to inspect, analyze, and present the data.
To import data from an HDInsight cluster
Create a new blank workbook.
Click the Power Query menu, click From Other Sources, and then click From Windows Azure HDInsight.
Note: If the Power Query menu does not show up, Go to File -> Options -> Add-Ins and select COM Addins from the drop-down Manager box at the bottom of the page. Select the Go... button and verify that the box for the Microsoft Office Power Query for Excel Add-In has been checked.
Enter the Account Name of the Azure Blob Storage Account associated with your cluster, and then click OK.
Enter the Account Key for the Azure Blob Storage Account, and then click Save. (You only need to do this the first time you access this store.)
In the Navigator pane on the left of the Query Editor, click the Blob storage container name. By default the container name is the same name as the cluster name.
Locate HiveSampleData.txt in the Name column, and then click Binary in the same row.
Right-click Column1, point to Split Column, and then click By Delimiter.
Select Tab in Select or enter delimiter, and At each occurrence of the delimiter, and then click OK.
Click Done. The query then imports the data file into Excel.
In this article you learned how to use Power Query to retrieve data from the HDInsight Service into Excel. Similarly, you can retrieve data from the HDInsight Service into SQL Azure. It is also possible to upload data into an HDInsight Service. To learn more, see the following articles: