Free trial *Internet Service Required

Additional Learning Resources

 

Resources for Learning about Apache Hadoop-based Service for Windows Azure and the Hadoop and Windows Azure Ecosystems

Microsoft: Hadoop on Windows Azure

Microsoft: Windows and SQL Database

Microsoft: Business Intelligence

Apache Hadoop:

  • Apache Hadoop - software library providing a framework that allows for the distributed processing of large data sets across clusters of computers
  • HDFS - Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications.

Hadoop Ecosystem:

  • Pig - a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
  • Mahout - a machine learning library with algorithms for clustering, classfication and batch based collaborative filtering that are implemented on top of Apache Hadoop using the map/reduce paradigm.
  • Hive - data warehouse software built on top of Apache Hadoop that facilitates querying and managing large datasets residing in distributed storage.
  • Pegasus - a Peta-scale graph mining system that runs in parallel, distributed manner on top of Hadoop and that provides algorithms for important graph mining tasks such as Degree, PageRank, Random Walk with Restart (RWR), Radius, and Connected Components.
  • Sqoop - a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
  • Flume - a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large log data amounts to HDFS.

Videos

Community Sites and Blogs:


Rss Newsletter