قالب وردپرس درنا توس
Home / How To / How to install Apache Spark Cluster Computing Framework on Debian 10

How to install Apache Spark Cluster Computing Framework on Debian 10



Apache Spark is a free and open source computational framework used for analysis, machine learning and graph processing on large data sets. Spark comes with 80+ high-level operators that allow you to build parallel apps and use them interactively from the Scala, Python, R and SQL scales. It is a flash-down, memory processing engine specifically designed for computer science. It provides a rich set of features including, speed, fault tolerance, real-time current processing, computer memory, Advance analysis and many more.

In this tutorial we will show you how to install Apache Spark on Debian 10 server.

Prerequisites

  • A server running Debian 10 with 2 GB of RAM.
  • A root password is configured on your server.

Getting Started

Before you begin, it is recommended that you update your server with the latest version. You can update it with the following command:

  apt-get update -y 
apt-get upgrade -y

After your server is updated, restart it to implement the changes.

Installing Java [1
9659007] Apache Spark is written in the Java language. So you need to install Java in your system. By default, the latest version of Java is available in the standard Debian 10 repository. You can install it with the following command:

  apt-get install default-jdk -y 

After installing Java, verify the installed version of Java with the following command:

  java --version 

You should get the following output:

  openjdk 11.0.5 2019-10-15
OpenJDK Runtime Environment (build 11.0.5 + 10-post-Debian-1deb10u1)
OpenJDK 64-bit server VM (build 11.0.5 + 10-post-Debian-1deb10u1, mixed mode, sharing)

Download Apache Spark

First you need to download the latest version of Apache Spark from its official website. At the time of writing this article is the latest version of Apache Spark 3.0. You can download it to the / opt directory with the following command:

  cd / opt 
wget http://apachemirror.wuchna.com/spark/spark-3.0.0-preview2/spark-3.0.0- preview2-bin-hadoop2.7.tgz

When the download is complete, extract the downloaded file with the following command:

  tar -xvzf spark-3.0.0-preview2-bin-hadoop2.7.tgz [19659009] Then you rename the extracted directory to the spark shown below: 

  etc. kick-3.0.0-preview2-bin-hadoop2.7 kick 

Then you have to set the environment for Spark. You can do this by editing the ~ / .bashrc file:

  nano ~ / .bashrc 

Add the following lines to the end of the file:

  export SPARK_HOME = / opt / spark
export PATH = $ PATH: $ SPARK_HOME / bin: $ SPARK_HOME / sbin

Save and close the file when you are done. Then enable the environment with the following command:

  source ~ / .bashrc 

Start Master Server

You can now start the Master server with the following command:

  start-master.sh [19659009] You should get the following output: 

  starts org.apache.spark.deploy.master.Master, logs to /opt/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1 -debian10.out

By default, Apache Spark listens on port 8080. You can verify it with the following command:

  netstat -ant | grep 8080 

Output:

  tcp6 0 0 ::: 8080 ::: * LISTEN

Now open your browser and type the URL http: // server-ip address: 8080 . You should see the following page:

 Apache Spark

Note Spark URL " spark: // debian10: 7077 " from the above image. This will be used to start Spark Workers.

Starting the Spark Worker Process

You can now start the Spark worker with the following command:

  start-slave.sh kick: // debian10: 7077 [19659009] You should get the following output: 

  starts org.apache .spark.deploy.worker.Worker, logs to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-debian10.out

Access Spark Shell

Spark Shell is an interactive environment that provides an easy way to learn API and analyze data interactively. You can access the Spark shell with the following command:

  spark-shell 

You should see the following output:

  WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access of org.apache.spark.unsafe.Platform (file: /opt/spark/jars/spark-unsafe_2.12-3.0.0-preview2.jar) to constructor java.nio.DirectByteBuffer (long, int )
WARNING: Consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access = alert to enable alerts about additional illegal reflective access operations
WARNING: All illegal access measures will be denied in a future release
19/12/29 15:53:11 WARNING NativeCodeLoader: Unable to load native hadoop library for your platform ... with built-in java classes where applicable
Using Spark's default log4j profile: org / apache / spark / log4j-defaults.properties
Sets the default log level to "WARN".
To adjust the logging level, use sc.setLogLevel (newLevel). For SparkR, use setLogLevel (newLevel).
Spark context web interface available at http: // debian10: 4040
Spark context available as & # 39; sc & # 39; (master = local [*] app-id = local-1577634806690).
Spark session available as "spark".
Welcome to
____ __
/ __ / __ ___ _____ / / __
_   / _  / _ `/ __ / & # 39; _ /
/ ___ / .__ /  _, _ / _ / / _ /  _  version 3.0.0-preview2
/ _ /

Using Scala version 2.12.10 (OpenJDK 64-bit Server VM, Java 11.0.5)
Enter expressions to get them evaluated.
Type: help for more information.

scala>

From here you can learn how to get the best out of Apache Spark quickly and conveniently.

To stop Spark Master and Slave server, run the following commands:

  stop-slave.sh 
stop-master.sh

You have now installed Apache Spark on Debian 10- server. For more information, please refer to the official Spark documentation at Spark Doc.


Source link