NLP Token Visualizer

Token visualization for NLP tasks. Inspired by Tiktokenizer. The idea was to use your own vocabularies to visualize encoded text.

Import and required arguments

To use the method process_text, include the following:

import tokenviz
from tokenviz.visualization import process_text

Load your text and define your encode and decode methods. These methods are given as arguments to process_text:

text = "some text ..."

# takes text and converts it to a list of integers according to the encoding scheme
def encode(text_to_encode):
    # some magic happens
    return encoded_text

# takes a list of integers and decodes these into the text according to the decoding scheme
def decode(text_to_decode):
    # some magic happens
    return decoded_text

process_text(text, encode, decode)

Examples

HTML example

Here's a simple example using the predefined encoding/decoding methods with a simple string. Assuming encode simply maps each character to a number, the following...

text = 'Hello world!'
processed_text = process_text(text, encode, decode, markup='html')

generates...

<span style="background-color: Khaki;">H</span><span style="background-color: AliceBlue;">e</span><span style="background-color: Aquamarine;">l</span><span style="background-color: Coral;">l</span><span style="background-color: Lavender;">o</span><span style="background-color: Ivory;"> </span><span style="background-color: DarkSalmon;">w</span><span style="background-color: Khaki;">o</span><span style="background-color: AliceBlue;">r</span><span style="background-color: Aquamarine;">l</span><span style="background-color: Coral;">d</span><span style="background-color: Lavender;">!</span>

LaTeX example

Add the following imports and definitions to your LaTeX document.

\usepackage{listings}
\usepackage{xcolor}

% Define a custom style for listings
\lstdefinestyle{custom}{
    basicstyle=\small\ttfamily, % Small font size and typewriter style
    escapeinside={(*@}{@*)},    % Escape for inline LaTeX
}

Then add your generated LaTeX code to the listing:

\begin{lstlisting}[caption=My title, label=mylabel, style=custom]
% Your LaTeX code goes here
\end{lstlisting}

Assuming encode simply maps each character to a number, the following...

text = 'Hello world!'
processed_text = process_text(text, encode, decode, markup='latex')

generates...

(*@\colorbox{yellow}{H}@*)(*@\colorbox{pink}{e}@*)(*@\colorbox{lightgray}{l}@*)(*@\colorbox{lime}{l}@*)(*@\colorbox{cyan}{o}@*)(*@\colorbox{magenta}{ }@*)(*@\colorbox{yellow}{w}@*)(*@\colorbox{pink}{o}@*)(*@\colorbox{lightgray}{r}@*)(*@\colorbox{lime}{l}@*)(*@\colorbox{cyan}{d}@*)(*@\colorbox{magenta}{!}@*)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
_demo		_demo
tokenviz		tokenviz
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLP Token Visualizer

Import and required arguments

Examples

HTML example

LaTeX example

About

Uh oh!

Releases

Packages

Languages

License

JorisSchelfaut/nlp-token-visualizer

Folders and files

Latest commit

History

Repository files navigation

NLP Token Visualizer

Import and required arguments

Examples

HTML example

LaTeX example

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages