Description
In this course, participants learn about cloud-based big data solutions such as Amazon Elastic MapReduce (EMR), Amazon Redshift, Amazon Kinesis, and the rest of the AWS big data platform. Participants learn how to use Amazon EMR to process data using the broad ecosystem of Apache Hadoop tools like Hive and Hue. Additionally, this course teaches participants how to create big data environments, work with Amazon DynamoDB, Amazon Redshift, and Amazon Kinesis, and leverage best practices to design big data environments for security and cost-effectiveness. Participants should have experience with the AWS environment and a basic understanding of data warehousing.
Details
Length
3 days
PDU
21
Delivery Method
In Person | Live Virtual
Materials
- Access to the Solarity LMS including the recording for the training
- Course materials and other resources as provided by the trainer
- Course files and lab exercises
Objectives
- Data ingestion, transfer, and compression;
- AWS data storage options;
- Using DynamoDB with Amazon EMR;
- Using Kinesis for near real-time Big Data processing;
- Understanding of Apache Hadoop and Amazon EMR;
- Using Amazon Elastic MapReduce;
- The Hadoop Ecosystem;
- Using Hive for advertising analytics;
- Using Streaming for Life Sciences analytics;
- Using Hue with Amazon EMR;
- Running Pig Scripts with Hue on Amazon EMR;
- Running Spark and Spark SQL interactively on Amazon EMR
- Using Spark and Spark SQL for in-memory analytics;
- Managing Amazon EMR costs;
- Securing your Amazon EMR deployments;
- Data warehouses and columnar datastores;
- Understanding of Amazon Redshift;
- Optimizing your Amazon Redshift environment;
- The Big Data ecosystem on AWS;
- Visualizing and orchestrating Big Data; and
- Using Tibco Spotfire to visualize Big Data.