ner-plugin

This Recogito Studio (RS) plugin adds the ability to perform Named Entity Recognition (NER) on plain text and TEI documents. This plugin adds a new option to the document card menu on the project home page:

A document must either be public or owned by you in RS for you to be able to perform this operation. Once selected you ae presented with options to configure the name of the NER'ed document, which NER model to use, and the language of the document.

Once the NER is completed, a new document is added to your project which contains the named entities as read-only annotations. In the case of a plain text document, the produced document is encoded in TEI with the annotations added as a standoff element which can be interpreted by RS. A TEI document which has NER performed will return a new TEI document with a new standoff element containing the NER annotations.

Trigger.dev job runner

NER can be a long running operation, so this plugin makes use of Trigger.dev background job runner. While it is easiest to use the cloud based service, trigger.dev can be self-hosted. Please see the documentation for guidance on self-hosting.

Required ENV variables for Trigger.dev

Whether using the cloud service or self-hosting, the plugin requires that the following ENV are set on the deployed trigger.dev project. Note that this uses the example Stanford CoreNLP Services which is detailed below.

CORENLP_URL_EN=<url of CoreNLP English service>
CORENLP_URL_DE=<url of CoreNLP German service>

Deploy your tasks

To deploy the required tasks to your Trigger.dev job runner you will need to update your trigger.config.ts file located in the /src directory.

import { defineConfig } from '@trigger.dev/sdk/v3';

export default defineConfig({
  project: 'proj_fyeypkhgyaejpiweobwq',
  runtime: 'node',
  logLevel: 'log',
  // The max compute seconds a task is allowed to run. If the task run exceeds this duration, it will be stopped.
  // You can override this on an individual task.
  // See https://trigger.dev/docs/runs/max-duration
  maxDuration: 3600,
  retries: {
    enabledInDev: true,
    default: {
      maxAttempts: 3,
      minTimeoutInMs: 1000,
      maxTimeoutInMs: 10000,
      factor: 2,
      randomize: true,
    },
  },
  dirs: ['./trigger'],
});

Set the project attribute to the Project ref which you can find on your trigger project's Project settings tab.

Then set the URL for your Trigger.dev job runner in your local .env file:

TRIGGER_SERVER_URL=<your trigger.dev url>

Now deploy your tasks to the Trigger.dev server by executing the following command at the root of this project repo:

npx trigger.dev@latest deploy -c ./src/trigger.config.ts

This will build containers for your tasks and deploy them to the Trigger.dev job runner.

Once complete you should see your tasks on the Tasks tab on your Trigger.dev project.

Example NER services

The repository contains an example docker-compose YML file that deploys an English and German NLP services. These feature fairly comprehensive NER capabilities.

Configuring Additional NER services

To add additional NER service endpoints requires the following steps:

1. Update NER options in NERMenuExtension.tsx

NERMenuExtension.tsx contains the NEROptions object.

  const NEROptions: { value: string; label: string }[] = [
    { value: 'stanford-core', label: t['Stanford Core NLP'] },
  ];

It is currently configured to only offer the Stanford Core NLP service. Add new values and labels to include additional NER services.

2. Create new Trigger tasks

The top level stanfordCore.ts task calls the sub-tasks that implement the NER pipeline. For a new endpoint, you would implement a new task using stanfordCore.ts as a template. The basic set of steps here are:

Create a Supabase client
Download the file for NER from Supabase
Convert to plain text. There are two different subtasks that handle this based on wether the file is text or TEI XML:

Call your new endpoint task. How this task functions will depend on how the endpoint functions but the important requirement is that your new task returns the same structure as the example doStanfordNlp.ts task NERResults.

export type TagTypes =
  | 'persName'
  | 'orgName'
  | 'placeName'
  | 'settlement'
  | 'country'
  | 'region'
  | 'date';
export type NEREntry = {
  text: string;
  startIndex: number;
  endIndex: number;
  localizedTag: string;
  inlineTag: TagTypes;
  attributes?: { [key: string]: string };
};
export type NERResults = {
  entries: NEREntry[];
};

3. Update the NERAgentRoute API endpoint

The NERAgentEndpoint.ts file receives the options from the configuration dialog. To handle your new options, update this block of code and trigger your new top level task:

  if (body.model === 'stanford-core') {
    handle = await stanfordCore.trigger({
      projectId: projectId as string,
      documentId: documentId as string,
      language: body.language,
      token: body.token,
      key: supabaseAPIKey,
      serverURL: supabaseServerUrl,
      nameOut: body.nameOut,
      outputLanguage: body.outputLanguage,
    });
  }

i.e.:

  if (body.model === 'stanford-core') {
    handle = await stanfordCore.trigger({
      projectId: projectId as string,
      documentId: documentId as string,
      language: body.language,
      token: body.token,
      key: supabaseAPIKey,
      serverURL: supabaseServerUrl,
      nameOut: body.nameOut,
      outputLanguage: body.outputLanguage,
    });
  } else if(body.model === 'may-new-ner-service`) {
    handle  = await myNewNERService.trigger({
      projectId: projectId as string,
      documentId: documentId as string,
      language: body.language,
      token: body.token,
      key: supabaseAPIKey,
      serverURL: supabaseServerUrl,
      nameOut: body.nameOut,
      outputLanguage: body.outputLanguage,
    });
  }

4. Rebuild and update the tasks on Trigger.dev

Whether your are using the cloud service or self hosting, the procedure is the same:

npm run build
npx trigger.dev@latest deploy -c ./src/trigger.config.ts

The trigger deploy task will use your configuration set in /src/trigger.config.ts

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.dev		.dev
doc-assets		doc-assets
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
Dockerfile.de		Dockerfile.de
Dockerfile.en		Dockerfile.en
LICENSE		LICENSE
README.md		README.md
docker-compose.corenlp.yml		docker-compose.corenlp.yml
package-lock.json		package-lock.json
package.json		package.json
test_api.py		test_api.py
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ner-plugin

Trigger.dev job runner

Required ENV variables for Trigger.dev

Deploy your tasks

Example NER services

Configuring Additional NER services

1. Update NER options in NERMenuExtension.tsx

2. Create new Trigger tasks

3. Update the NERAgentRoute API endpoint

4. Rebuild and update the tasks on Trigger.dev

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

recogito/plugin-ner

Folders and files

Latest commit

History

Repository files navigation

ner-plugin

Trigger.dev job runner

Required ENV variables for Trigger.dev

Deploy your tasks

Example NER services

Configuring Additional NER services

1. Update NER options in NERMenuExtension.tsx

2. Create new Trigger tasks

3. Update the NERAgentRoute API endpoint

4. Rebuild and update the tasks on Trigger.dev

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages