Original Source Here
How to Install Spark NLP
A step-by-step tutorial on how to make Spark NLP work on your local computer
Apache Spark is an open-source framework for fast and general-purpose data processing. It provides a unified engine that can run complex analytics, including Machine Learning, in a fast and distributed way.
Spark NLP is an Apache Spark module that provides advanced Natural Language Processing (NLP) capabilities to Spark applications. It can be used to build complex text processing pipelines, including tokenization, sentence splitting, part of speech tagging, parsing, and named entity recognition.
Although the documentation, which describes how to install Spark NLP is quite clear, sometimes you can get stuck, while installing it. For this reason, in this article, I try to describe a step-by-step procedure to make Spark NLP work on your computer.
To install Spark NLP, you should install the following tools:
- Apache Spark
- Spark NLP.
You have already installed Python following the procedure described in the technical requirements section. So, we can start installing the software from the second step, Java.
Spark NLP is built on top of Apache Spark, which can be installed on any OS that supports Java 8. Check if you have Java 8 by running the following command in Terminal:
If Java is already installed, you should see the following output:
openjdk version “1.8.0_322”
OpenJDK Runtime Environment (build 1.8.0_322-bre_2022_02_28_15_01-b00)OpenJDK 64-Bit Server VM (build 25.322-b00, mixed mode)
If Java 8 is not installed, you can download Java 8 from this link and follow the wizard.
In Ubuntu, you can install
openjdk-8 through the package manager:
sudo apt-get install openjdk-8-jre
In Mac OS, you can install
brew install openjdk@8
If you have another version of Java installed, you can download Java 8, as previously described, and then set the
JAVA_HOME environment variable to the path to the Java 8 directory.
Once installed, you can verify that scala works properly, by running the following command:
4 Apache Spark
You download the package, and then, you can extract it, and place it wherever you want in your filesystem. Then, you need to add the path to the bin directory contained in your spark directory to the
PATH environment variable. In Unix, you can export the
Then, you export the
SPARK_HOME environment variable with the path to your spark directory. In Unix, you can export the
SPARK_HOME variable as follows:
To check if Apache Spark is installed properly, you can run the following command:
A shell should open:
Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ ‘_//___/ .__/\_,_/_/ /_/\_\ version 3.1.2/_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_322)Type in expressions to have them evaluated.Type :help for more information.scala>
To exit the shell, you can use Ctrl+C.
5 PySpark and Spark NLP
PySpark and Spark NLP are two Python libraries, that you can install through pip:
pip install pyspark
pip install spark-nlp
Now Spark NLP should be ready on your computer!
Congratulations! You have just installed Spark NLP on your computer! You have installed Java, Scala, Apache Spark, Spark NLP and PySpark!
Now it is time to play with Spark NLP. There are many tutorials available on the Web. I suggest you to start from the following notebooks:
You can also check this tutorial that explains how to integrate Spark NLP with Comet, a platform used to monitor Machine Learing experiments
If you have read this far, for me it is already a lot for today. Thanks! You can read my trending articles at this link.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot