GitHub - huridocs/NER-in-docker: NER-in-docker

Named Entity Recognition with Docker

A Docker-powered service for named entity extraction from text or PDF files.

This repository provides a Docker-powered service for Named Entity Recognition (NER), enabling the extraction of specific entities from text or PDF files. The service enables extraction of various entities with the help of pdf-document-layout-analysis, a service that segments documents with high accuracy.

Project Links:

GitHub: NER-in-docker
HuggingFace: NER-in-docker

Quick Start

Clone the service:

git clone https://github.com/huridocs/NER-in-docker
cd NER-in-docker

Run the service:

With GPU support:
```
make start
```
Without GPU support:
```
make start_no_gpu
```

API Usage

The service exposes a FastAPI REST API. By default, it runs on http://localhost:8000.

Endpoints

1. `/` (POST)

Extract named entities from text or PDF.

Parameters:

namespace (str, optional): Namespace for storing/retrieving entities in SQLite.
identifier (str, optional): Source identifier for the text.
text (str, optional): Text to analyze (if no file is provided).
file (PDF, optional): PDF file to analyze (multipart/form-data).
fast (bool, optional): Use fast PDF segmentation (default: False).

Example (Text):

curl -X POST http://localhost:8000/ \
  -F "text=Your text here" \
  -F "namespace=my_namespace"

Example (PDF):

curl -X POST http://localhost:8000/ \
  -F "file=@/path/to/file.pdf" \
  -F "namespace=my_namespace" \
  -F "fast=true"

2. `/delete_namespace` (POST)

Delete all entities for a given namespace.

Parameters:

namespace (str, required): Namespace to delete.

Example:

curl -X POST http://localhost:8000/delete_namespace \
  -F "namespace=my_namespace"

3. `/` (GET)

Returns Python version info (for health check).

Response Structure

The main endpoint (/, POST) returns a JSON object:

{
  "entities": [
    {
      "text": "Entity text",
      "type": "EntityType",
      "source_id": "Source identifier",
      "character_start": 123,
      "character_end": 130,
      "relevance_percentage": 98.5,
      "group_name": "GroupName",
      "segment": {
        "page_number": 1,
        "segment_number": 0,
        ...
      }
    },
    ...
  ],
  "groups": [
    {
      "name": "GroupName",
      "entities": [
        {
          "text": "Entity text",
          "index": 0,
          ...
        },
        ...
      ]
    },
    ...
  ]
}

entities: List of extracted named entities with metadata.
groups: List of entity groups, each containing related entities.

Notes

If namespace is provided, entities are stored and reused for reference extraction.
If both text and file are provided, only the file is processed.
The service supports both text and PDF input.

For more details, see the source code and API models in src/drivers/rest/response_entities/NamedEntitiesResponse.py.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
dev-requirements.txt		dev-requirements.txt
docker-compose-gpu.yml		docker-compose-gpu.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Named Entity Recognition with Docker

Project Links:

Quick Start

API Usage

Endpoints

1. `/` (POST)

2. `/delete_namespace` (POST)

3. `/` (GET)

Response Structure

Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

huridocs/NER-in-docker

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition with Docker

Project Links:

Quick Start

API Usage

Endpoints

1. / (POST)

2. /delete_namespace (POST)

3. / (GET)

Response Structure

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. `/` (POST)

2. `/delete_namespace` (POST)

3. `/` (GET)

Packages