AASMT Training Courses

Location

Community Services & Continuing Education - Cairo

Objectives

  • Hadoop is a framework for storing and processing big data, while big data is a term used to describe large and complex data sets that are difficult to process using traditional methods. Hadoop can be used to process big data by dividing it into smaller blocks that can be processed in parallel.

Outcomes

studying the the components that collectively form a Hadoop ecosystem: HDFS: Hadoop Distributed File System YARN: Yet Another Resource Negotiator MapReduce: Programming based Data Processing Spark: In-Memory data processing PIG, HIVE: Query based processing of data services HBase: NoSQL Database Mahout, Spark MLLib: Machine Learning algorithm libraries Solar, Lucene: Searching and Indexing Zookeeper: Managing cluster Oozie: Job Scheduling

Course Contents

Overview: Apache Hadoop is an open source framework intended to make interaction with big data easier, However, for those who are not acquainted with this technology, one question arises that what is big data ? Big data is a term given to the data sets which can’t be processed in an efficient manner with the help of traditional methodology such as RDBMS. Hadoop has made its place in the industries and companies that need to work on large data sets which are sensitive and needs efficient handling. Hadoop is a framework that enables processing of large data sets which reside in the form of clusters. Being a framework, Hadoop is made up of several modules that are supported by a large ecosystem of technologies. Introduction: Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. It includes Apache projects and various commercial tools and solutions. There are four major elements of Hadoop i.e. HDFS, MapReduce, YARN, and Hadoop Common Ut