Open Site Navigation

Big Data & Hadoop

Our Big Data with Hadoop training course is designed to show Software Developers, DBAs, Business Intelligence Analysts, Software Architects and other vested stakeholders how to use key Open Source technologies in order to derive significant value from extremely large data sets. We will show you how to overcome the challenges of managing and analysing Big Data with tools and techniques such as Apache Hadoop, NoSQL databases and Cloud Computing services. Our Big Data with Hadoop course features extensive hands-on exercises reflecting real-world scenarios, and you are encouraged to take these away to kick-start your own Big Data efforts. This course gives an industry accreditation to business investigators, information distribution center specialists or different experts with comparative foundations to help them change into the universe.

Big Data & Hadoop

Course Modules

Introduction to Big Data

  • The four dimensions of Big Data: volume, velocity, variety, veracity
  • Introducing the Storage, MapReduce and Query Stack
  • Establishing the business importance of Big Data
  • Addressing the challenge of extracting useful data
  • Integrating Big Data with traditional data

Analysing Your Data Characteristics

  • Selecting data sources for analysis
  • Eliminating redundant data
  • Establishing the role of NoSQL

Overview of Big Data Stores

  • Data models: key value, graph, document, column–family
  • Hadoop Distributed File System
  • HBase
  • Hive
  • Cassandra
  • Hypertable
  • Amazon S3
  • BigTable
  • DynamoDB
  • MongoDB
  • Redis
  • Riak
  • Neo4J

Selecting Big Data Stores

  • Choosing the correct data stores based on your data characteristics
  • Moving code to data
  • Implementing polyglot data store solutions
  • Aligning business goals to the appropriate data store

Integrating Disparate Data Stores

  • Mapping data to the programming framework
  • Connecting and extracting data from storage
  • Transforming data for processing
  • Subdividing data in preparation for Hadoop MapReduce

Employing Hadoop Mapreduce

  • Creating the components of Hadoop MapReduce jobs
  • Distributing data processing across server farms
  • Executing Hadoop MapReduce jobs
  • Monitoring the progress of job flows

The Building Blocks of Hadoop Mapreduce

  • Distinguishing Hadoop daemons
  • Investigating the Hadoop Distributed File System
  • Selecting appropriate execution modes: local, pseudo–distributed and fully

Handling Streaming Data

  • Comparing real–time processing models
  • Leveraging Storm to extract live events
  • Lightning–fast processing with Spark and Shark

Abstracting Hadoop Mapreduce Jobs with Pig

  • Communicating with Hadoop in Pig Latin
  • Executing commands using the Grunt Shell
  • Streamlining high–level processing

Performing Ad Hoc Big Data Querying with Hive

  • Persisting data in the Hive MegaStore
  • Performing queries with HiveQL
  • Investigating Hive file formats

Creating Business Value from Extracted Data

  • Mining data with Mahout
  • Visualising processed results with reporting tools
  • Querying in real time with Impala

Defining a Big Data Strategy for Your Organisation

  • Establishing your Big Data needs
  • Meeting business goals with timely data
  • Evaluating commercial Big Data tools
  • Managing organisational expectations

Enabling Analytic Innovation

  • Focusing on business importance
  • Framing the problem
  • Selecting the correct tools
  • Achieving timely results

Implementing a Big Data Solution

  • Selecting suitable vendors and hosting options
  • Balancing costs against business value
  • Keeping ahead of the curve

Duration of the courses: 120 Hours


Delegates should have an understanding of Enterprise application development, business systems integration and or Database Design / Querying / Reporting. In simple terms, working knowledge of the Microsoft Windows platform and basic database concepts.

Key Benefits

By the end of this course, you will have learnt Big Data Patterns and Anti-Patterns, Hadoop, HDFS, MapReduce with examples, NoSQL Databases with demonstrations in Cassandra, HBase and others, Building Data Warehouses with Hive, Integration with SQL Databases, Parallel Programming with Pig Machine Learning & Pattern Matching with Apache Mahout, Utilise Amazon Web Services.