John snow labs nlp

John snow labs nlp DEFAULT

Spark NLP: State of the Art Natural Language Processing

Spark NLP is a state-of-the-art Natural Language Processing library built on top of Apache Spark. It provides simple, performant & accurate NLP annotations for machine learning pipelines that scale easily in a distributed environment. Spark NLP comes with 3700+ pretrained pipelines and models in more than 200+ languages. It also offers tasks such as Tokenization, Word Segmentation, Part-of-Speech Tagging, Word and Sentence Embeddings, Named Entity Recognition, Dependency Parsing, Spell Checking, Text Classification, Sentiment Analysis, Token Classification, Machine Translation (+180 languages), Summarization & Question Answering, and many more NLP tasks.

Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Google T5, and MarianMT not only to Python and R, but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively.

Project's website

Take a look at our official Spark NLP page: http://nlp.johnsnowlabs.com/ for user documentation and examples

Community support

  • Slack For live discussion with the Spark NLP community and the team
  • GitHub Bug reports, feature requests, and contributions
  • Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
  • Medium Spark NLP articles
  • YouTube Spark NLP video tutorials

Table of contents

Features

  • Tokenization
  • Trainable Word Segmentation
  • Stop Words Removal
  • Token Normalizer
  • Document Normalizer
  • Stemmer
  • Lemmatizer
  • NGrams
  • Regex Matching
  • Text Matching
  • Chunking
  • Date Matcher
  • Sentence Detector
  • Deep Sentence Detector (Deep learning)
  • Dependency parsing (Labeled/unlabeled)
  • Part-of-speech tagging
  • Sentiment Detection (ML models)
  • Spell Checker (ML and DL models)
  • Word Embeddings (GloVe and Word2Vec)
  • BERT Embeddings (TF Hub & HuggingFace models)
  • DistilBERT Embeddings (HuggingFace models)
  • RoBERTa Embeddings (HuggingFace models)
  • XLM-RoBERTa Embeddings (HuggingFace models)
  • Longformer Embeddings (HuggingFace models)
  • ALBERT Embeddings (TF Hub & HuggingFace models)
  • XLNet Embeddings
  • ELMO Embeddings (TF Hub models)
  • Universal Sentence Encoder (TF Hub models)
  • BERT Sentence Embeddings (TF Hub & HuggingFace models)
  • RoBerta Sentence Embeddings (HuggingFace models)
  • XLM-RoBerta Sentence Embeddings (HuggingFace models)
  • Sentence Embeddings
  • Chunk Embeddings
  • Unsupervised keywords extraction
  • Language Detection & Identification (up to 375 languages)
  • Multi-class Sentiment analysis (Deep learning)
  • Multi-label Sentiment analysis (Deep learning)
  • Multi-class Text Classification (Deep learning)
  • BERT for Token Classification
  • DistilBERT for Token Classification
  • ALBERT for Token Classification
  • RoBERTa for Token Classification
  • XLM-RoBERTa for Token Classification
  • XLNet for Token Classification
  • Longformer for Token Classification
  • Neural Machine Translation (MarianMT)
  • Text-To-Text Transfer Transformer (Google T5)
  • Named entity recognition (Deep learning)
  • Easy TensorFlow integration
  • GPU Support
  • Full integration with Spark ML functions
  • +2000 pre-trained models in +200 languages!
  • +1700 pre-trained pipelines in +200 languages!
  • Multi-lingual NER models: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, and Urdu.

Requirements

To use Spark NLP you need the following requirements:

  • Java 8
  • Apache Spark 3.1.x (or 3.0.x, or 2.4.x, or 2.3.x)

NOTE: Java 11 is supported if you are using Spark NLP and Spark/PySpark 3.x and above

GPU (optional):

Spark NLP 3.3.0 is built with TensorFlow 2.4.1 and requires the followings if you need GPU support

Quick Start

This is a quick example of how to use Spark NLP pre-trained pipeline in Python and PySpark:

$ java -version # should be Java 8 (Oracle or OpenJDK) $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x $ pip install spark-nlp==3.3.0 pyspark

In Python console or Jupyter kernel:

# Import Spark NLPfromsparknlp.baseimport*fromsparknlp.annotatorimport*fromsparknlp.pretrainedimportPretrainedPipelineimportsparknlp# Start SparkSession with Spark NLP# start() functions has 4 parameters: gpu, spark23, spark24, and memory# sparknlp.start(gpu=True) will start the session with GPU support# sparknlp.start(spark23=True) is when you have Apache Spark 2.3.x installed# sparknlp.start(spark24=True) is when you have Apache Spark 2.4.x installed# sparknlp.start(memory="16G") to change the default driver memory in SparkSessionspark=sparknlp.start() # Download a pre-trained pipelinepipeline=PretrainedPipeline('explain_document_dl', lang='en') # Your testing datasettext="""The Mona Lisa is a 16th century oil painting created by Leonardo.It's held at the Louvre in Paris."""# Annotate your testing datasetresult=pipeline.annotate(text) # What's in the pipelinelist(result.keys()) Output: ['entities', 'stem', 'checked', 'lemma', 'document', 'pos', 'token', 'ner', 'embeddings', 'sentence'] # Check the resultsresult['entities'] Output: ['Mona Lisa', 'Leonardo', 'Louvre', 'Paris']

For more examples, you can visit our dedicated repository to showcase all Spark NLP use cases!

Apache Spark Support

Spark NLP 3.3.0 has been built on top of Apache Spark 3.x while fully supports Apache Spark 2.3.x and Apache Spark 2.4.x:

Spark NLPApache Spark 2.3.xApache Spark 2.4.xApache Spark 3.0.xApache Spark 3.1.x
3.3.xYESYESYESYES
3.2.xYESYESYESYES
3.1.xYESYESYESYES
3.0.xYESYESYESYES
2.7.xYESYESNONO
2.6.xYESYESNONO
2.5.xYESYESNONO
2.4.xPartiallyYESNONO
1.8.xPartiallyYESNONO
1.7.xYESNONONO
1.6.xYESNONONO
1.5.xYESNONONO

NOTE: Starting 3.0.0 release, the default and pacakges are based on Scala 2.12 and Apache Spark 3.x by default.

NOTE: Starting the 3.0.0 release, we support all major releases of Apache Spark 2.3.x, Apache Spark 2.4.x, Apache Spark 3.0.x, and Apache Spark 3.1.x

Find out more about versions from our release notes.

Databricks Support

Spark NLP 3.3.0 has been tested and is compatible with the following runtimes:

CPU:

  • 5.5 LTS
  • 5.5 LTS ML
  • 6.4
  • 6.4 ML
  • 7.3
  • 7.3 ML
  • 7.4
  • 7.4 ML
  • 7.5
  • 7.5 ML
  • 7.6
  • 7.6 ML
  • 8.0
  • 8.0 ML
  • 8.1
  • 8.1 ML
  • 8.2
  • 8.2 ML
  • 8.3
  • 8.3 ML
  • 8.4
  • 8.4 ML
  • 9.0
  • 9.0 ML
  • 9.1
  • 9.01 ML

GPU:

  • 8.1 ML & GPU
  • 8.2 ML & GPU
  • 8.3 ML & GPU
  • 8.4 ML & GPU
  • 9.0 ML & GPU
  • 9.1 ML & GPU

NOTE: Spark NLP 3.3.0 is based on TensorFlow 2.4.x which is compatible with CUDA11 and cuDNN 8.0.2. The only Databricks runtimes supporting CUDA 11. are 8.x ML with GPU.

EMR Support

Spark NLP 3.3.0 has been tested and is compatible with the following EMR releases:

  • emr-5.20.0
  • emr-5.21.0
  • emr-5.21.1
  • emr-5.22.0
  • emr-5.23.0
  • emr-5.24.0
  • emr-5.24.1
  • emr-5.25.0
  • emr-5.26.0
  • emr-5.27.0
  • emr-5.28.0
  • emr-5.29.0
  • emr-5.30.0
  • emr-5.30.1
  • emr-5.31.0
  • emr-5.32.0
  • emr-5.33.0
  • emr-6.1.0
  • emr-6.2.0
  • emr-6.3.0

Full list of Amazon EMR 5.x releases Full list of Amazon EMR 6.x releases

NOTE: The EMR 6.0.0 is not supported by Spark NLP 3.3.0

Usage

Spark Packages

Command line (requires internet connection)

Spark NLP supports all major releases of Apache Spark 2.3.x, Apache Spark 2.4.x, Apache Spark 3.0.x, and Apache Spark 3.1.x. That's being said, you need to choose the right package for the right Apache Spark major release:

Apache Spark 3.x (3.0.x and 3.1.x - Scala 2.12)

# CPU spark-shell --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.0 pyspark --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.0 spark-submit --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.0

The has been published to the Maven Repository.

# GPU spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.0 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.0 spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:3.3.0

The has been published to the Maven Repository.

Apache Spark 2.4.x (Scala 2.11)

# CPU spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.0 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.0 spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-spark24_2.11:3.3.0

The has been published to the Maven Repository.

# GPU spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.0 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.0 spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark24_2.11:3.3.0

The has been published to the Maven Repository.

Apache Spark 2.3.x (Scala 2.11)

# CPU spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.0 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.0 spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-spark23_2.11:3.3.0

The has been published to the Maven Repository.

# GPU spark-shell --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.0 pyspark --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.0 spark-submit --packages com.johnsnowlabs.nlp:spark-nlp-gpu-spark23_2.11:3.3.0

The has been published to the Maven Repository.

NOTE: In case you are using large pretrained models like UniversalSentenceEncoder, you need to have the following set in your SparkSession:

spark-shell \ --driver-memory 16g \ --conf spark.kryoserializer.buffer.max=2000M \ --packages com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.0

Scala

Spark NLP supports Scala 2.11.x if you are using Apache Spark 2.3.x or 2.4.x and Scala 2.12.x if you are using Apache Spark 3.0.x or 3.1.x. Our packages are deployed to Maven central. To add any of our packages as a dependency in your application you can follow these coordinates:

Maven

spark-nlp on Apache Spark 3.x:

<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp_2.12</artifactId> <version>3.3.0</version> </dependency>

spark-nlp-gpu:

<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp-gpu_2.12</artifactId> <version>3.3.0</version> </dependency>

spark-nlp on Apache Spark 2.4.x:

<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-spark24 --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp-spark24_2.11</artifactId> <version>3.3.0</version> </dependency>

spark-nlp-gpu:

<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu-spark24 --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp-gpu_2.11</artifactId> <version>3.3.0</version> </dependency>

spark-nlp on Apache Spark 2.3.x:

<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-spark23 --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp-spark23_2.11</artifactId> <version>3.3.0</version> </dependency>

spark-nlp-gpu:

<!-- https://mvnrepository.com/artifact/com.johnsnowlabs.nlp/spark-nlp-gpu-spark23 --> <dependency> <groupId>com.johnsnowlabs.nlp</groupId> <artifactId>spark-nlp-gpu-spark23_2.11</artifactId> <version>3.3.0</version> </dependency>

SBT

spark-nlp on Apache Spark 3.x.x:

spark-nlp-gpu:

spark-nlp on Apache Spark 2.4.x:

spark-nlp-gpu:

spark-nlp on Apache Spark 2.3.x:

spark-nlp-gpu:

Maven Central: https://mvnrepository.com/artifact/com.johnsnowlabs.nlp

If you are interested, there is a simple SBT project for Spark NLP to guide you on how to use it in your projects Spark NLP SBT Starter

Python

Spark NLP supports Python 3.6.x and 3.7.x if you are using PySpark 2.3.x or 2.4.x and Python 3.8.x if you are using PySpark 3.x.

Python without explicit Pyspark installation

Pip/Conda

If you installed pyspark through pip/conda, you can install through the same channel.

Pip:

pip install spark-nlp==3.3.0

Conda:

conda install -c johnsnowlabs spark-nlp

PyPI spark-nlp package / Anaconda spark-nlp package

Then you'll have to create a SparkSession either from Spark NLP:

importsparknlpspark=sparknlp.start()

or manually:

spark=SparkSession.builder \ .appName("Spark NLP")\ .master("local[4]")\ .config("spark.driver.memory","16G")\ .config("spark.driver.maxResultSize", "0") \ .config("spark.kryoserializer.buffer.max", "2000M")\ .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.0")\ .getOrCreate()

If using local jars, you can use instead for comma-delimited jar files. For cluster setups, of course, you'll have to put the jars in a reachable location for all driver and executor nodes.

Quick example:

importsparknlpfromsparknlp.pretrainedimportPretrainedPipeline#create or get Spark Sessionspark=sparknlp.start() sparknlp.version() spark.version#download, load and annotate a text by pre-trained pipelinepipeline=PretrainedPipeline('recognize_entities_dl', 'en') result=pipeline.annotate('The Mona Lisa is a 16th century oil painting created by Leonardo')

Compiled JARs

Build from source

spark-nlp

  • FAT-JAR for CPU on Apache Spark 3.x.x
  • FAT-JAR for GPU on Apache Spark 3.x.x
sbt -Dis_gpu=true assembly
  • FAT-JAR for CPU on Apache Spark 2.4.x
sbt -Dis_spark24=true assembly
  • FAT-JAR for GPU on Apache Spark 2.4.x
sbt -Dis_gpu=true -Dis_spark24=true assembly
  • FAT-JAR for CPU on Apache Spark 2.3.x
sbt -Dis_spark23=true assembly
  • FAT-JAR for GPU on Apache Spark 2.3.x
sbt -Dis_gpu=true -Dis_spark23=true assembly

Using the jar manually

If for some reason you need to use the JAR, you can either download the Fat JARs provided here or download it from Maven Central.

To add JARs to spark programs use the option:

spark-shell --jars spark-nlp.jar

The preferred way to use the library when running spark programs is using the option as specified in the section.

Apache Zeppelin

Use either one of the following options

  • Add the following Maven Coordinates to the interpreter's library list
com.johnsnowlabs.nlp:spark-nlp_2.12:3.3.0
  • Add a path to pre-built jar from here in the interpreter's library list making sure the jar is available to driver path

Sours: https://github.com/JohnSnowLabs/spark-nlp

Spark NLP

Text processing programming library

Spark NLP is an open-source text processing library for advanced natural language processing for the Python, Java and Scala programming languages.[2][3][4] The library is built on top of Apache Spark and its Spark ML library.[5]

Its purpose is to provide an API for natural language processing pipelines that implements recent academic research results as production-grade, scalable, and trainable software. The library offers pre-trained neural network models, pipelines, and embeddings, as well as support for training custom models.[5]

Features[edit]

The design of the library make use of the concept of a pipeline which is an ordered set of text annotators.[6] Out of the box annotators include, tokenizer, normalizer, stemming, lemmatizer, regular expression, TextMatcher, chunker, DateMatcher, SentenceDetector, DeepSentenceDetector, POS tagger, ViveknSentimentDetector, sentiment analysis, named entity recognition, conditional random field annotator, deep learning annotator, spell checking and correction, dependency parser, typed dependency parser, document classification, and language detection.[7]

The Models Hub is a platform for sharing open-source as well as licensed pretrained models and pipelines. It includes pre-trained pipelines with tokenization, lemmatization, part-of-speech tagging, and named entity recognition exist for more than thirteen languages; word embeddings including GloVe, ELMo, BERT, ALBERT, XLNet, Small BERT, and ELECTRA; sentence embeddings including Universal Sentence Embeddings (USE)[8] and Language Agnostic BERT Sentence Embeddings (LaBSE).[9] It also includes resources and pre-trained models for more than two hundred languages. Spark NLP base code includes support for East Asian languages such as tokenizers for Chinese, Japanese, Korean; for right-to-left languages such as Urdu, Farsi, Arabic, Hebrew and pre-trained multilingual word and sentence embeddings such as LaUSE and a translation annotator.

Usage in healthcare[edit]

Spark NLP for Healthcare is a commercial extension of Spark NLP for clinical and biomedical text mining.[10] It provides healthcare-specific annotators, pipelines, models, and embeddings for clinical entity recognition, clinical entity linking, entity normalization, assertion status detection, de-identification, relation extraction, and spell checking and correction.

The library offers access to several clinical and biomedical transformers: JSL-BERT-Clinical, BioBERT, ClinicalBERT,[11] GloVe-Med, GloVe-ICD-O. It also includes over 50 pre-trained healthcare models, that can recognize the entities such as clinical, drugs, risk factors, anatomy, demographics, and sensitive data.

Spark OCR[edit]

Spark OCR is another commercial extension of Spark NLP for optical character recognition (OCR) from images, scanned PDF documents, and DICOM files.[7] It is a software library built on top of Apache Spark. It provides several image pre-processing features for improving text recognition results such as adaptive thresholding and denoising, skew detection & correction, adaptive scaling, layout analysis and region detection, image cropping, removing background objects.

Due to the tight coupling between Spark OCR and Spark NLP, users can combine NLP and OCR pipelines for tasks such as extracting text from images, extracting data from tables, recognizing and highlighting named entities in PDF documents or masking sensitive text in order to de-identify images.[12]

Several output formats are supported by Spark OCR such as PDF, images, or DICOM files with annotated or masked entities, digital text for downstream processing in Spark NLP or other libraries, structured data formats (JSON and CSV), as files or Spark data frames.

Users can also distribute the OCR jobs across multiple nodes in a Sparkcluster.

License and availability[edit]

Spark NLP is licensed under the Apache 2.0 license. The source code is publicly available on GitHub as well as documentation and a tutorial. Prebuilt versions of Spark NLP are available in PyPi and Anaconda Repository for Python development, in Maven Central for Java & Scala development, and in Spark Packages for Spark development.

Award[edit]

In March 2019, Spark NLP received Open Source Award for its contributions in natural language processing in Python, Java, and Scala.[13]

References[edit]

  1. ^Talby, David. "Introducing the Natural Language Processing Library for Apache Spark". databricks.com. databricks. Retrieved 29 March 2019.
  2. ^Ellafi, Saif Addin (2018-02-28). "Comparing production-grade NLP libraries: Running Spark-NLP and spaCy pipelines". O'Reilly Media. Retrieved 2019-03-29.
  3. ^Ellafi, Saif Addin (2018-02-28). "Comparing production-grade NLP libraries: Accuracy, performance, and scalability". O'Reilly Media. Retrieved 2019-03-29.
  4. ^Ewbank, Kay. "Spark Gets NLP Library". www.i-programmer.info.
  5. ^ abThomas, Alex (July 2020). Natural Language Processing with Spark NLP: Learning to Understand Text at Scale (First ed.). United States of America: O'Reilly Media. ISBN .
  6. ^Talby, David (2017-10-19). "Introducing the Natural Language Processing Library for Apache Spark - The Databricks Blog". Databricks. Retrieved 2019-08-27.
  7. ^ abJha, Bineet Kumar; G, Sivasankari G.; R, Venugopal K. (May 2, 2021). "Sentiment Analysis for E-Commerce Products Using Natural Language Processing". Annals of the Romanian Society for Cell Biology: 166–175 – via www.annalsofrscb.ro.
  8. ^Cer, Daniel; Yang, Yinfei; Kong, Sheng-yi; Hua, Nan; Limtiaco, Nicole; John, Rhomni St; Constant, Noah; Guajardo-Cespedes, Mario; Yuan, Steve; Tar, Chris; Sung, Yun-Hsuan; Strope, Brian; Kurzweil, Ray (12 April 2018). "Universal Sentence Encoder". arXiv:1803.11175 [cs.CL].
  9. ^Feng, Fangxiaoyu; Yang, Yinfei; Cer, Daniel; Arivazhagan, Naveen; Wang, Wei (3 July 2020). "Language-agnostic BERT Sentence Embedding". arXiv:2007.01852 [cs.CL].
  10. ^Team, Editorial (2018-09-04). "The Use of NLP to Extract Unstructured Medical Data From Text". insideBIGDATA. Retrieved 2019-08-27.
  11. ^Alsentzer, Emily; Murphy, John; Boag, William; Weng, Wei-Hung; Jindi, Di; Naumann, Tristan; McDermott, Matthew (June 2019). "Publicly Available Clinical BERT Embeddings". Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics: 72–78. arXiv:1904.03323. doi:10.18653/v1/W19-1909. S2CID 102352093.
  12. ^"A Unified CV, OCR & NLP Model Pipeline for Document Understanding at DocuSign". NLP Summit. Retrieved 18 September 2020.
  13. ^https://www.oreilly.com/pub/pr/3277

Sources[edit]

External links[edit]

Sours: https://en.wikipedia.org/wiki/Spark_NLP
  1. Basic smash bros controls
  2. Shih tzu breeders
  3. 98 f150 single cab
  4. 2005 dodge stratus reviews

Spark NLP Workshop

Showcasing notebooks and codes of how to use Spark NLP in Python and Scala.

Table of contents

Python Setup

$ java -version # should be Java 8 (Oracle or OpenJDK) $ conda create -n sparknlp python=3.7 -y $ conda activate sparknlp # spark-nlp by default is based on pyspark 3.x $ pip install spark-nlp pyspark

Colab setup

# This is only to setup PySpark and Spark NLP on Colab!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash

Main repository

https://github.com/JohnSnowLabs/spark-nlp

Project's website

Take a look at our official spark-nlp page: http://nlp.johnsnowlabs.com/ for user documentation and examples

Slack community channel

Join Slack

Contributing

If you find any example that is no longer working, please create an issue.

License

Apache Licence 2.0

Sours: https://github.com/JohnSnowLabs/spark-nlp-workshop

Spark NLP: State of the Art Natural Language Processing

The most widely used NLP library in the enterprise

Source:2020 NLP Industry Survey, by Gradient Flow.

100% Open Source

Including pre-trained models and pipelines

Natively scalable

The only NLP library built natively on Apache Spark

Multiple Languages

Full Python, Scala, and Java support

Transformers at Scale

Spark NLP is the only open-source NLP library in production that offers state-of-the-art transformers such as BERT, ALBERT, ELECTRA, XLNet, DistilBERT, RoBERTa, XLM-RoBERTa, Longformer, ELMO, Universal Sentence Encoder, Google T5, and MarianMT not only to Python, and R but also to JVM ecosystem (Java, Scala, and Kotlin) at scale by extending Apache Spark natively

Right Out of The Box

Spark NLP ships with many NLP features, pre-trained models and pipelines

NLP Features

  • Tokenization
  • Word Segmentation
  • Stop Words Removal
  • Normalizer
  • Stemmer
  • Lemmatizer
  • NGrams
  • Regex Matching
  • Text Matching
  • Chunking
  • Date Matcher
  • Part-of-speech tagging
  • Sentence Detector (DL models)
  • Dependency parsing
  • Sentiment Detection (ML models)
  • Spell Checker (ML and DL models)
  • Word Embeddings (GloVe and Word2Vec)
  • BERT Embeddings
  • DistilBERT Embeddings
  • RoBERTa Embeddings
  • XLM-RoBERTa Embeddings
  • Longformer Embeddings
  • ALBERT Embeddings
  • XLNet Embeddings
  • ELMO Embeddings
  • Universal Sentence Encoder
  • Sentence Embeddings
  • Chunk Embeddings
  • Neural Machine Translation (MarianMT)
  • Text-To-Text Transfer Transformer (Google T5)
  • Unsupervised keywords extraction
  • Language Detection & Identification (up to 375 languages)
  • Multi-class Text Classification (DL model)
  • Multi-label Text Classification (DL model)
  • Multi-class Sentiment Analysis (DL model)
  • BERT for Token Classification
  • DistilBERT for Token Classification
  • ALBERT for Token Classification
  • RoBERTa for Token Classification
  • XLM-RoBERTa for Token Classification
  • XLNet for Token Classification
  • Longformer for Token Classification
  • Named entity recognition (DL model)
  • Easy TensorFlow integration
  • GPU Support
  • Full integration with Spark ML functions
  • 2000+ pre-trained models in 200+ languages!
  • 1700+ pre-trained pipelines in 200+ languages!

Benchmark

Spark NLP 3.x obtained the best performing academic peer-reviewed results

Training NER

  • State-of-the-art Deep Learning algorithms
  • Achieve high accuracy within a few minutes
  • Achieve high accuracy with a few lines of codes
  • Blazing fast training
  • Use CPU or GPU
  • 80+ Pretrained Embeddings including GloVe, Word2Vec, BERT, DistilBERT, RoBERTa, XLM-RoBERTa, Longformer, ELMO, ELECTRA, ALBERT, XLNet, BioBERT, etc.
  • Multi-lingual NER models in Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, German, Hebrew, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, and Urdu
SYSTEMYEARLANGUAGECONLL ‘03
Spark NLP v32021Python/Scala/Java/R93.2 (test F1)
95 (dev F1)
spaCy v32021
Python91.6
Stanza (StanfordNLP)2020
Python92.1
Flair2018Python93.1
CoreNLP2015Java89.6
SYSTEMYEARLANGUAGEONTONOTES
Spark NLP v32021Python/Scala/Java/R90.0 (test F1)
92.5 (dev F1)
spaCy RoBERTa2020
Python89.7 (dev F1)
Stanza (StanfordNLP)2020
Python88.8 (dev F1)
Flair2018Python89.7
Sours: https://nlp.johnsnowlabs.com/

Nlp labs john snow

Product Overview

Ready to use NLP Server for analyzing text documents using NLU library. All Spark NLP pre-trained models and pipelines are easy to use via a simple and intuitive UI, without writing a line of code. For more expert users and more complex tasks, NLP Server also provides a REST API that can be used to process high amounts of data.

With NLP Server you get access to state-of-the-art 3000+ models in over 200+ languages for problems like Sentiment Analysis, Spell Checking, Text Summarization, Translation, Named Entity Recognition, Question Answering, Spam Classifiers and much more! All those features are available in any programming language via simple Rest API calls.

Exploit the latest research developments in NLP from the Transformers world with LongFormers, XLM-RoBerta, XLING, Electra, LaBSE, DistilBERT, XLNET , USE, T5, Marian and much more!

Operating System

Linux/Unix, Ubuntu 20.04

Highlights

  • Spark NLP text annotation

Pricing Information

Usage Information

Support Information

Customer Reviews

Sours: https://aws.amazon.com/marketplace/pp/prodview-4ohxjejvg7vwm
How to Setup Spark NLP on UBUNTU and Write your FIRST CODE

At the same time he made her look into his eyes. He made her drink from the neck of the bottle. Each time, before touching the neck, she had to stick it into the vagina. Then lick the neck and, finally, drink. She was ashamed to look him in the eye, she blushed.

You will also be interested:

After a while, he already sat down on my fingers with his ass. I felt how hard his ass was squeezing my fingers, his. Moans turned me on stronger and stronger and I already really wanted to fuck him. I took him by the hips, pressed his chest a little against the wall.



54 55 56 57 58