How to Install Spark NLP

https://miro.medium.com/max/1200/0*8RT0UoNWQ31EWeb_

Original Source Here

Environment Setup

How to Install Spark NLP

A step-by-step tutorial on how to make Spark NLP work on your local computer

Photo by Sigmund on Unsplash

Apache Spark is an open-source framework for fast and general-purpose data processing. It provides a unified engine that can run complex analytics, including Machine Learning, in a fast and distributed way.

Spark NLP is an Apache Spark module that provides advanced Natural Language Processing (NLP) capabilities to Spark applications. It can be used to build complex text processing pipelines, including tokenization, sentence splitting, part of speech tagging, parsing, and named entity recognition.

Although the documentation, which describes how to install Spark NLP is quite clear, sometimes you can get stuck, while installing it. For this reason, in this article, I try to describe a step-by-step procedure to make Spark NLP work on your computer.

To install Spark NLP, you should install the following tools:

  • Python
  • Java
  • Scala
  • Apache Spark
  • PySpark
  • Spark NLP.

1 Python

You have already installed Python following the procedure described in the technical requirements section. So, we can start installing the software from the second step, Java.

2 Java

Spark NLP is built on top of Apache Spark, which can be installed on any OS that supports Java 8. Check if you have Java 8 by running the following command in Terminal:

java –version

If Java is already installed, you should see the following output:

openjdk version “1.8.0_322”
OpenJDK Runtime Environment (build 1.8.0_322-bre_2022_02_28_15_01-b00)
OpenJDK 64-Bit Server VM (build 25.322-b00, mixed mode)

If Java 8 is not installed, you can download Java 8 from this link and follow the wizard.

In Ubuntu, you can install openjdk-8 through the package manager:

sudo apt-get install openjdk-8-jre

In Mac OS, you can install openjdk-8 through brew:

brew install openjdk@8

If you have another version of Java installed, you can download Java 8, as previously described, and then set the JAVA_HOME environment variable to the path to the Java 8 directory.

3 Scala

Apache Spark requires scala 2.12 or 2.13 to work properly. You can install scala 2.12.15 following the procedure described here.

Once installed, you can verify that scala works properly, by running the following command:

scala -version

4 Apache Spark

You can download Apache Spark from its official Web site, available here. There are many versions of Apache Spark. Personally, I have installed version 3.1.2, which is available here.

You download the package, and then, you can extract it, and place it wherever you want in your filesystem. Then, you need to add the path to the bin directory contained in your spark directory to the PATH environment variable. In Unix, you can export the PATH variable:

export PATH=$PATH:/path/to/spark/bin

Then, you export the SPARK_HOME environment variable with the path to your spark directory. In Unix, you can export the SPARK_HOME variable as follows:

export SPARK_HOME=”/path/to/spark”

To check if Apache Spark is installed properly, you can run the following command:

spark-shell

A shell should open:

Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ ‘_//___/ .__/\_,_/_/ /_/\_\ version 3.1.2/_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)Type in expressions to have them evaluated.Type :help for more information.scala>

To exit the shell, you can use Ctrl+C.

5 PySpark and Spark NLP

PySpark and Spark NLP are two Python libraries, that you can install through pip:

pip install pyspark
pip install spark-nlp

Now Spark NLP should be ready on your computer!

Summary

Congratulations! You have just installed Spark NLP on your computer! You have installed Java, Scala, Apache Spark, Spark NLP and PySpark!

Now it is time to play with Spark NLP. There are many tutorials available on the Web. I suggest you to start from the following notebooks:

You can also check this tutorial that explains how to integrate Spark NLP with Comet, a platform used to monitor Machine Learing experiments

If you have read this far, for me it is already a lot for today. Thanks! You can read my trending articles at this link.

Related Articles

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: