Apache Spark is a free and open source computational framework used for analysis, machine learning and graph processing on large data sets. Spark comes with 80+ high-level operators that allow you to build parallel apps and use them interactively from the Scala, Python, R and SQL scales. It is a flash-down, memory processing engine specifically designed for computer science. It provides a rich set of features including, speed, fault tolerance, real-time current processing, computer memory, Advance analysis and many more.
In this tutorial we will show you how to install Apache Spark on Debian 10 server.
- A server running Debian 10 with 2 GB of RAM.
- A root password is configured on your server.
Before you begin, it is recommended that you update your server with the latest version. You can update it with the following command:
apt-get update -y
apt-get upgrade -y
After your server is updated, restart it to implement the changes.
Installing Java [1
9659007] Apache Spark is written in the Java language. So you need to install Java in your system. By default, the latest version of Java is available in the standard Debian 10 repository. You can install it with the following command:
apt-get install default-jdk -y
apt-get install default-jdk -y
After installing Java, verify the installed version of Java with the following command:
You should get the following output:
openjdk 11.0.5 2019-10-15 OpenJDK Runtime Environment (build 11.0.5 + 10-post-Debian-1deb10u1) OpenJDK 64-bit server VM (build 11.0.5 + 10-post-Debian-1deb10u1, mixed mode, sharing)
Download Apache Spark
First you need to download the latest version of Apache Spark from its official website. At the time of writing this article is the latest version of Apache Spark 3.0. You can download it to the / opt directory with the following command:
cd / opt
wget http://apachemirror.wuchna.com/spark/spark-3.0.0-preview2/spark-3.0.0- preview2-bin-hadoop2.7.tgz
When the download is complete, extract the downloaded file with the following command:
tar -xvzf spark-3.0.0-preview2-bin-hadoop2.7.tgz  Then you rename the extracted directory to the spark shown below:
etc. kick-3.0.0-preview2-bin-hadoop2.7 kick
Then you have to set the environment for Spark. You can do this by editing the ~ / .bashrc file:
nano ~ / .bashrc
Add the following lines to the end of the file:export SPARK_HOME = / opt / spark export PATH = $ PATH: $ SPARK_HOME / bin: $ SPARK_HOME / sbin
Save and close the file when you are done. Then enable the environment with the following command:
source ~ / .bashrc
Start Master Server
You can now start the Master server with the following command:
start-master.sh  You should get the following output:starts org.apache.spark.deploy.master.Master, logs to /opt/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1 -debian10.out
By default, Apache Spark listens on port 8080. You can verify it with the following command:
netstat -ant | grep 8080
Output:tcp6 0 0 ::: 8080 ::: * LISTEN
Now open your browser and type the URL http: // server-ip address: 8080 . You should see the following page:
Note Spark URL " spark: // debian10: 7077 " from the above image. This will be used to start Spark Workers.
Starting the Spark Worker Process
You can now start the Spark worker with the following command:
start-slave.sh kick: // debian10: 7077  You should get the following output:starts org.apache .spark.deploy.worker.Worker, logs to /opt/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-debian10.out
Access Spark Shell
Spark Shell is an interactive environment that provides an easy way to learn API and analyze data interactively. You can access the Spark shell with the following command:
You should see the following output:WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access of org.apache.spark.unsafe.Platform (file: /opt/spark/jars/spark-unsafe_2.12-3.0.0-preview2.jar) to constructor java.nio.DirectByteBuffer (long, int ) WARNING: Consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access = alert to enable alerts about additional illegal reflective access operations WARNING: All illegal access measures will be denied in a future release 19/12/29 15:53:11 WARNING NativeCodeLoader: Unable to load native hadoop library for your platform ... with built-in java classes where applicable Using Spark's default log4j profile: org / apache / spark / log4j-defaults.properties Sets the default log level to "WARN". To adjust the logging level, use sc.setLogLevel (newLevel). For SparkR, use setLogLevel (newLevel). Spark context web interface available at http: // debian10: 4040 Spark context available as & # 39; sc & # 39; (master = local [*] app-id = local-1577634806690). Spark session available as "spark". Welcome to ____ __ / __ / __ ___ _____ / / __ _ / _ / _ `/ __ / & # 39; _ / / ___ / .__ / _, _ / _ / / _ / _ version 3.0.0-preview2 / _ / Using Scala version 2.12.10 (OpenJDK 64-bit Server VM, Java 11.0.5) Enter expressions to get them evaluated. Type: help for more information. scala>
From here you can learn how to get the best out of Apache Spark quickly and conveniently.
To stop Spark Master and Slave server, run the following commands:
You have now installed Apache Spark on Debian 10- server. For more information, please refer to the official Spark documentation at Spark Doc.