Amazon EMR

  • Amazon EMR Is a real hardware cluster to run smart jobs?
  • EMR Is billable service, specially when working with clusters.
  • We deploy our spark jobs on a cluster Manager
  • Packing spark jar On EMR
  • Running Spark Jobs on EMR
    • SSH into your cluster and download the spark file from S3.
    • To run a spark job On a cluster use script “spark-submit”.
    • Copy the master public DNS of Amazon EMR into browser with port 18080
      • We get spark jobs, history server.
        • Shows, list of spark application that have been completed.
      • We also get diagnostics of a job like
        • Duration
        • Shows the task of job
      • The executives page shows the information about the cluster
        • It’s nodes
        • Information about nodes.
        • Performance of nodes
        • Remove the resources.
  • The execution shows following information
    • Stage
    • Number of tasks to be performed in the stage.
      • A task is a set of code executed against a partition.
    • [Stage 0 :  (0+8) / 46]
      • Means we have 46 partitions 
      • First number is in circular bracket is tasks completed, and the second number is task running.
      • Above 0 tasks have been completed and 8 are running for now.
      • Second number shows number of tasks running, depending on the cores Of the CPU and executors running for example If we have 2 executors Running and each has four cores Then we have 8 tasks running.
    • [(40+6)/46]
      • Means 40 are completed and six are pending out of 46.
      • Here, only six cores Out of eight are running.
    • Terminate cluster after testing to avoid running costs.

Comments