Additional Learning Resources
Resources for Learning about Apache Hadoop-based Service for Windows Azure and the Hadoop and Windows Azure Ecosystems
Microsoft: Hadoop on Windows Azure
Microsoft: Windows and SQL Database
Microsoft: Business Intelligence
- Apache Hadoop - software library providing a framework that allows for the distributed processing of large data sets across clusters of computers
- HDFS - Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.
- Pig - a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
- Mahout - a machine learning library with algorithms for clustering, classfication and batch based collaborative filtering that are implemented on top of Apache Hadoop using the map/reduce paradigm.
- Hive - data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in distributed storage.
- Pegasus - a Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop and that provides algorithms for important graph mining tasks such as Degree, PageRank, Random Walk with Restart (RWR), Radius, and Connected Components.
- Sqoop - a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
- Flume - a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large log data amounts to HDFS.
Community Sites and Blogs: