llamacppembeddings doesnt work #26372

machengyan · 2024-09-12T06:02:32Z

machengyan
Sep 12, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_community.document_loaders import WebBaseLoader
import llama_cpp.llama_cpp as llama_cpp
loader = WebBaseLoader("https://docs.smith.langchain.com/user_guide")
 
docs = loader.load()

from langchain_community.embeddings import LlamaCppEmbeddings
embeddings = LlamaCppEmbeddings(model_path="/usr/share/ollama/.ollama/models/blobs/sha256-8eeb52dfb3bb9aefdf9d1ef24b3bdbcfbe82238798c4b918278320b6fcef18fe")
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
 
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
vector = FAISS.from_documents(documents, embeddings)

Description

this example from document doesn't work, because the default embedding class use pooling type 0, so it works as no pooling. the output would be tokenized embedding which is not acceptable by FAISS.
should consider add pooling type to the LlamaCppEmbeddings creation, and tried and it works.

System Info

System Information

OS: Linux
OS Version: #128-Ubuntu SMP Fri Jul 5 09:28:59 UTC 2024
Python Version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]

Package Information

langchain_core: 0.2.38
langchain: 0.2.16
langchain_community: 0.2.16
langsmith: 0.1.108
langchain_cli: 0.0.30
langchain_text_splitters: 0.2.2
langgraph: 0.2.19
langserve: 0.2.3

Other Dependencies

aiohttp: 3.10.5
async-timeout: 4.0.3
dataclasses-json: 0.6.7
fastapi: 0.112.2
gitpython: 3.1.41
httpx: 0.27.2
jsonpatch: 1.33
langgraph-checkpoint: 1.0.9
langserve[all]: Installed. No version info available.
libcst: 1.4.0
numpy: 1.24.1
orjson: 3.10.7
packaging: 23.2
pydantic: 2.8.2
pyproject-toml: 0.0.10
PyYAML: 5.4.1
requests: 2.32.3
SQLAlchemy: 2.0.25
sse-starlette: 1.8.2
tenacity: 8.5.0
tomlkit: 0.12.5
typer[all]: Installed. No version info available.
typing-extensions: 4.9.0
uvicorn: 0.23.2

@machengyan · 2024-09-12T06:04:14Z

dosubot[bot]
bot Sep 12, 2024

Hey there, @machengyan! I'm here to help you out with any bugs, questions, or contributions you might have. Let's tackle this together!

To address the issue with the LlamaCppEmbeddings class not working due to the default pooling type, you can modify the class to include a pooling parameter that is compatible with FAISS. Here is the updated code for the LlamaCppEmbeddings class:

from typing import Any, Dict, List, Optional
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Field, root_validator

class LlamaCppEmbeddings(BaseModel, Embeddings):
    """llama.cpp embedding models.

    To use, you should have the llama-cpp-python library installed, and provide the
    path to the Llama model as a named parameter to the constructor.
    Check out: https://github.com/abetlen/llama-cpp-python

    Example:
        .. code-block:: python

            from langchain_community.embeddings import LlamaCppEmbeddings
            llama = LlamaCppEmbeddings(model_path="/path/to/model.bin")
    """

    client: Any  #: :meta private:
    model_path: str

    n_ctx: int = Field(512, alias="n_ctx")
    """Token context window."""

    n_parts: int = Field(-1, alias="n_parts")
    """Number of parts to split the model into. 
    If -1, the number of parts is automatically determined."""

    seed: int = Field(-1, alias="seed")
    """Seed. If -1, a random seed is used."""

    f16_kv: bool = Field(False, alias="f16_kv")
    """Use half-precision for key/value cache."""

    logits_all: bool = Field(False, alias="logits_all")
    """Return logits for all tokens, not just the last token."""

    vocab_only: bool = Field(False, alias="vocab_only")
    """Only load the vocabulary, no weights."""

    use_mlock: bool = Field(False, alias="use_mlock")
    """Force system to keep model in RAM."""

    n_threads: Optional[int] = Field(None, alias="n_threads")
    """Number of threads to use. If None, the number 
    of threads is automatically determined."""

    n_batch: Optional[int] = Field(512, alias="n_batch")
    """Number of tokens to process in parallel.
    Should be a number between 1 and n_ctx."""

    n_gpu_layers: Optional[int] = Field(None, alias="n_gpu_layers")
    """Number of layers to be loaded into gpu memory. Default None."""

    verbose: bool = Field(True, alias="verbose")
    """Print verbose output to stderr."""

    device: Optional[str] = Field(None, alias="device")
    """Device type to use and pass to the model"""

    pooling: str = Field("mean", alias="pooling")
    """Pooling type to use for embeddings. Options are 'mean', 'max', 'min'."""

    class Config:
        extra = "forbid"

    @root_validator(pre=False, skip_on_failure=True)
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that llama-cpp-python library is installed."""
        model_path = values["model_path"]
        model_param_names = [
            "n_ctx",
            "n_parts",
            "seed",
            "f16_kv",
            "logits_all",
            "vocab_only",
            "use_mlock",
            "n_threads",
            "n_batch",
            "verbose",
            "device",
        ]
        model_params = {k: values[k] for k in model_param_names}
        # For backwards compatibility, only include if non-null.
        if values["n_gpu_layers"] is not None:
            model_params["n_gpu_layers"] = values["n_gpu_layers"]

        try:
            from llama_cpp import Llama

            values["client"] = Llama(model_path, embedding=True, **model_params)
        except ImportError:
            raise ImportError(
                "Could not import llama-cpp-python library. "
                "Please install the llama-cpp-python library to "
                "use this embedding model: pip install llama-cpp-python"
            )
        except Exception as e:
            raise ValueError(
                f"Could not load Llama model from path: {model_path}. "
                f"Received error {e}"
            )

        return values

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents using the Llama model.

        Args:
            texts: The list of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        embeddings = self.client.create_embedding(texts)
        pooled_embeddings = self._apply_pooling(embeddings["data"])
        return [list(map(float, e)) for e in pooled_embeddings]

    def embed_query(self, text: str) -> List[float]:
        """Embed a query using the Llama model.

        Args:
            text: The text to embed.

        Returns:
            Embeddings for the text.
        """
        embedding = self.client.embed(text)
        pooled_embedding = self._apply_pooling([embedding])
        return list(map(float, pooled_embedding[0]))

    def _apply_pooling(self, embeddings: List[Dict[str, Any]]) -> List[List[float]]:
        """Apply pooling to the embeddings.

        Args:
            embeddings: The list of embeddings to pool.

        Returns:
            List of pooled embeddings.
        """
        if self.pooling == "mean":
            return [self._mean_pooling(e["embedding"]) for e in embeddings]
        elif self.pooling == "max":
            return [self._max_pooling(e["embedding"]) for e in embeddings]
        elif self.pooling == "min":
            return [self._min_pooling(e["embedding"]) for e in embeddings]
        else:
            raise ValueError(f"Unsupported pooling type: {self.pooling}")

    def _mean_pooling(self, embedding: List[float]) -> List[float]:
        return [sum(embedding) / len(embedding)]

    def _max_pooling(self, embedding: List[float]) -> List[float]:
        return [max(embedding)]

    def _min_pooling(self, embedding: List[float]) -> List[float]:
        return [min(embedding)]

This code adds a pooling parameter to the LlamaCppEmbeddings class and modifies the embed_documents and embed_query methods to apply the specified pooling type to the embeddings. The _apply_pooling method handles the pooling logic, and helper methods _mean_pooling, _max_pooling, and _min_pooling perform the actual pooling operations ^[1].

To continue talking to Dosu, mention @dosu.

3 replies

machengyan Sep 12, 2024
Author

this validator will not work, 'cause the pooling will be omited by the basemodel. should add to model_param_names as well.

dosubot[bot] bot Sep 12, 2024

To ensure the pooling parameter is not omitted by the BaseModel and is included in model_param_names, you need to add the pooling parameter to the LlamaCppEmbeddings class and include it in the model_param_names list within the validate_environment method.

Here is the modified code:

from typing import Any, Dict, List, Optional
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Field, root_validator

class LlamaCppEmbeddings(BaseModel, Embeddings):
    """llama.cpp embedding models.

    To use, you should have the llama-cpp-python library installed, and provide the
    path to the Llama model as a named parameter to the constructor.
    Check out: https://github.com/abetlen/llama-cpp-python

    Example:
        .. code-block:: python

            from langchain_community.embeddings import LlamaCppEmbeddings
            llama = LlamaCppEmbeddings(model_path="/path/to/model.bin")
    """

    client: Any  #: :meta private:
    model_path: str

    n_ctx: int = Field(512, alias="n_ctx")
    """Token context window."""

    n_parts: int = Field(-1, alias="n_parts")
    """Number of parts to split the model into. 
    If -1, the number of parts is automatically determined."""

    seed: int = Field(-1, alias="seed")
    """Seed. If -1, a random seed is used."""

    f16_kv: bool = Field(False, alias="f16_kv")
    """Use half-precision for key/value cache."""

    logits_all: bool = Field(False, alias="logits_all")
    """Return logits for all tokens, not just the last token."""

    vocab_only: bool = Field(False, alias="vocab_only")
    """Only load the vocabulary, no weights."""

    use_mlock: bool = Field(False, alias="use_mlock")
    """Force system to keep model in RAM."""

    n_threads: Optional[int] = Field(None, alias="n_threads")
    """Number of threads to use. If None, the number 
    of threads is automatically determined."""

    n_batch: Optional[int] = Field(512, alias="n_batch")
    """Number of tokens to process in parallel.
    Should be a number between 1 and n_ctx."""

    n_gpu_layers: Optional[int] = Field(None, alias="n_gpu_layers")
    """Number of layers to be loaded into gpu memory. Default None."""

    verbose: bool = Field(True, alias="verbose")
    """Print verbose output to stderr."""

    device: Optional[str] = Field(None, alias="device")
    """Device type to use and pass to the model"""

    pooling: Optional[str] = Field(None, alias="pooling")
    """Pooling method to use for embeddings."""

    class Config:
        extra = "forbid"

    @root_validator(pre=False, skip_on_failure=True)
    def validate_environment(cls, values: Dict) -> Dict:
        """Validate that llama-cpp-python library is installed."""
        model_path = values["model_path"]
        model_param_names = [
            "n_ctx",
            "n_parts",
            "seed",
            "f16_kv",
            "logits_all",
            "vocab_only",
            "use_mlock",
            "n_threads",
            "n_batch",
            "verbose",
            "device",
            "pooling",
        ]
        model_params = {k: values[k] for k in model_param_names}
        # For backwards compatibility, only include if non-null.
        if values["n_gpu_layers"] is not None:
            model_params["n_gpu_layers"] = values["n_gpu_layers"]

        try:
            from llama_cpp import Llama

            values["client"] = Llama(model_path, embedding=True, **model_params)
        except ImportError:
            raise ImportError(
                "Could not import llama-cpp-python library. "
                "Please install the llama-cpp-python library to "
                "use this embedding model: pip install llama-cpp-python"
            )
        except Exception as e:
            raise ValueError(
                f"Could not load Llama model from path: {model_path}. "
                f"Received error {e}"
            )

        return values

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of documents using the Llama model.

        Args:
            texts: The list of texts to embed.

        Returns:
            List of embeddings, one for each text.
        """
        embeddings = self.client.create_embedding(texts)
        return [list(map(float, e["embedding"])) for e in embeddings["data"]]

    def embed_query(self, text: str) -> List[float]:
        """Embed a query using the Llama model.

        Args:
            text: The text to embed.

        Returns:
            Embeddings for the text.
        """
        embedding = self.client.embed(text)
        return list(map(float, embedding))

In this modification, the pooling parameter is added to the class definition and included in the model_param_names list within the validate_environment method ^[1].

To continue talking to Dosu, mention @dosu.

machengyan Sep 12, 2024
Author

the llamacpp model itself support pooling_type, actually you just need to provide this parameter to the Llama(model_path, embedding=True, **model_params), i tried and it works

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llamacppembeddings doesnt work #26372

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llamacppembeddings doesnt work #26372

Uh oh!

machengyan Sep 12, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Other Dependencies

Replies: 1 comment · 3 replies

Uh oh!

dosubot[bot] bot Sep 12, 2024

Uh oh!

machengyan Sep 12, 2024 Author

Uh oh!

dosubot[bot] bot Sep 12, 2024

Uh oh!

machengyan Sep 12, 2024 Author

machengyan
Sep 12, 2024

Replies: 1 comment 3 replies

dosubot[bot]
bot Sep 12, 2024

machengyan Sep 12, 2024
Author

machengyan Sep 12, 2024
Author