Skip to content

JorisSchelfaut/nlp-token-visualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP Token Visualizer

Token visualization for NLP tasks. Inspired by Tiktokenizer. The idea was to use your own vocabularies to visualize encoded text.

Import and required arguments

To use the method process_text, include the following:

import tokenviz
from tokenviz.visualization import process_text

Load your text and define your encode and decode methods. These methods are given as arguments to process_text:

text = "some text ..."

# takes text and converts it to a list of integers according to the encoding scheme
def encode(text_to_encode):
    # some magic happens
    return encoded_text

# takes a list of integers and decodes these into the text according to the decoding scheme
def decode(text_to_decode):
    # some magic happens
    return decoded_text

process_text(text, encode, decode)

Examples

HTML example

Here's a simple example using the predefined encoding/decoding methods with a simple string. Assuming encode simply maps each character to a number, the following...

text = 'Hello world!'
processed_text = process_text(text, encode, decode, markup='html')

generates...

<span style="background-color: Khaki;">H</span><span style="background-color: AliceBlue;">e</span><span style="background-color: Aquamarine;">l</span><span style="background-color: Coral;">l</span><span style="background-color: Lavender;">o</span><span style="background-color: Ivory;"> </span><span style="background-color: DarkSalmon;">w</span><span style="background-color: Khaki;">o</span><span style="background-color: AliceBlue;">r</span><span style="background-color: Aquamarine;">l</span><span style="background-color: Coral;">d</span><span style="background-color: Lavender;">!</span>

LaTeX example

Add the following imports and definitions to your LaTeX document.

\usepackage{listings}
\usepackage{xcolor}

% Define a custom style for listings
\lstdefinestyle{custom}{
    basicstyle=\small\ttfamily, % Small font size and typewriter style
    escapeinside={(*@}{@*)},    % Escape for inline LaTeX
}

Then add your generated LaTeX code to the listing:

\begin{lstlisting}[caption=My title, label=mylabel, style=custom]
% Your LaTeX code goes here
\end{lstlisting}

Assuming encode simply maps each character to a number, the following...

text = 'Hello world!'
processed_text = process_text(text, encode, decode, markup='latex')

generates...

(*@\colorbox{yellow}{H}@*)(*@\colorbox{pink}{e}@*)(*@\colorbox{lightgray}{l}@*)(*@\colorbox{lime}{l}@*)(*@\colorbox{cyan}{o}@*)(*@\colorbox{magenta}{ }@*)(*@\colorbox{yellow}{w}@*)(*@\colorbox{pink}{o}@*)(*@\colorbox{lightgray}{r}@*)(*@\colorbox{lime}{l}@*)(*@\colorbox{cyan}{d}@*)(*@\colorbox{magenta}{!}@*)

About

Token visualization for NLP tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published