如何實作整合套件

本指南將逐步說明實作 LangChain 整合套件的流程。

整合套件只是 Python 套件，可以使用 pip install <您的套件名稱> 安裝，其中包含與 LangChain 核心介面相容的類別。

我們將涵蓋

（選用）如何引導啟動新的整合套件
如何實作組件，例如聊天模型和向量儲存，使其符合 LangChain 介面；

（選用）引導啟動新的整合套件

在本節中，我們將概述 2 個引導啟動新整合套件的選項，如果您喜歡，也歡迎使用其他工具！

langchain-cli：這是一個命令列工具，可用於使用 LangChain 組件範本和 Poetry 進行依賴管理來引導啟動新的整合套件。
Poetry：這是一個 Python 依賴管理工具，可用於引導啟動具有依賴項的新 Python 套件。然後您可以將 LangChain 組件新增到此套件中。

選項 1：langchain-cli（推薦）

在本指南中，我們將使用 langchain-cli 從範本建立新的整合套件，您可以編輯該範本以實作您的 LangChain 組件。

先決條件

GitHub 帳戶
PyPi 帳戶

使用 langchain-cli 引導啟動新的 Python 套件

首先，安裝 langchain-cli 和 poetry

pip install langchain-cli poetry

接下來，為您的套件想一個名稱。在本指南中，我們將使用 langchain-parrot-link。您可以透過在 PyPi 網站上搜尋來確認該名稱在 PyPi 上是否可用。

接下來，使用 langchain-cli 建立您的新 Python 套件，並使用 cd 導航到新目錄中

langchain-cli integration new

> The name of the integration to create (e.g. `my-integration`): parrot-link
> Name of integration in PascalCase [ParrotLink]:

cd parrot-link

接下來，讓我們新增任何我們需要的依賴項

poetry add my-integration-sdk

我們也可以在單獨的 poetry 依賴群組中新增一些 typing 或 test 依賴項。

poetry add --group typing my-typing-dep
poetry add --group test my-test-dep

最後，讓 poetry 設定包含您的依賴項以及您的整合套件的虛擬環境

poetry install --with lint,typing,test,test_integration

您現在有一個新的 Python 套件，其中包含 LangChain 組件的範本！此範本隨附每個整合類型的檔案，您可以隨意複製或刪除任何這些檔案（包括相關的測試檔案）。

若要從 [範本] 建立任何個別檔案，您可以執行例如：

langchain-cli integration new \
    --name parrot-link \
    --name-class ParrotLink \
    --src integration_template/chat_models.py \
    --dst langchain_parrot_link/chat_models_2.py

選項 2：Poetry（手動）

在本指南中，我們將使用 Poetry 進行依賴管理和封裝，您也可以隨意使用您偏好的任何其他工具。

先決條件

GitHub 帳戶
PyPi 帳戶

使用 Poetry 引導啟動新的 Python 套件

首先，安裝 Poetry

pip install poetry

接下來，為您的套件想一個名稱。在本指南中，我們將使用 langchain-parrot-link。您可以透過在 PyPi 網站上搜尋來確認該名稱在 PyPi 上是否可用。

接下來，使用 Poetry 建立您的新 Python 套件，並使用 cd 導航到新目錄中

poetry new langchain-parrot-link
cd langchain-parrot-link

使用 Poetry 新增主要依賴項，這會將它們新增到您的 pyproject.toml 檔案中

poetry add langchain-core

我們也將在單獨的 poetry 依賴群組中新增一些 test 依賴項。如果您未使用 Poetry，我們建議以不會將它們與您發布的套件一起封裝的方式新增這些依賴項，或者只是在您執行測試時單獨安裝它們。

langchain-tests 將提供我們稍後將使用的標準測試。我們建議將它們釘選到最新版本：

注意：將 <latest_version> 替換為下方 langchain-tests 的最新版本。

poetry add --group test pytest pytest-socket pytest-asyncio langchain-tests==<latest_version>

最後，讓 poetry 設定包含您的依賴項以及您的整合套件的虛擬環境

poetry install --with test

您現在已準備好開始編寫您的整合套件！

編寫您的整合

假設您正在建置一個簡單的整合套件，為 LangChain 提供 ChatParrotLink 聊天模型整合。以下是您的專案結構可能看起來的簡單範例

langchain-parrot-link/
├── langchain_parrot_link/
│   ├── __init__.py
│   └── chat_models.py
├── tests/
│   ├── __init__.py
│   └── test_chat_models.py
├── pyproject.toml
└── README.md

所有這些檔案都應該已從步驟 1 存在，除了 chat_models.py 和 test_chat_models.py！我們稍後將按照標準測試指南實作 test_chat_models.py。

對於 chat_models.py，只需貼上聊天模型實作上方的內容即可。

將您的套件推送到公開的 Github 儲存庫

如果您想在 LangChain 文件中發布您的整合，則這是唯一需要的步驟。

在 GitHub 上建立新的儲存庫。
將您的程式碼推送到儲存庫。
確認您的儲存庫可供公眾查看（例如，在私密瀏覽視窗中，您未登入 Github 的情況下）。

實作 LangChain 組件

LangChain 組件是 langchain-core 中基底類別的子類別。範例包括聊天模型、向量儲存、工具、嵌入模型和檢索器。

您的整合套件通常會實作至少其中一個組件的子類別。展開下方的標籤頁以查看每個組件的詳細資訊。

聊天模型
向量儲存
嵌入
工具
檢索器

請參閱自訂聊天模型指南，以取得入門級聊天模型實作的詳細資訊。

您可以從以下範本或 langchain-cli 命令開始

langchain-cli integration new \
    --name parrot-link \
    --name-class ParrotLink \
    --src integration_template/chat_models.py \
    --dst langchain_parrot_link/chat_models.py

聊天模型程式碼範例

langchain_parrot_link/chat_models.py
"""ParrotLink chat models."""

from typing import Any, Dict, Iterator, List, Optional

from langchain_core.callbacks import (
    CallbackManagerForLLMRun,
)
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import (
    AIMessage,
    AIMessageChunk,
    BaseMessage,
)
from langchain_core.messages.ai import UsageMetadata
from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
from pydantic import Field


class ChatParrotLink(BaseChatModel):
    # TODO: Replace all TODOs in docstring. See example docstring:
    # https://github.com/langchain-ai/langchain/blob/7ff05357bac6eaedf5058a2af88f23a1817d40fe/libs/partners/openai/langchain_openai/chat_models/base.py#L1120
    """ParrotLink chat model integration.

    The default implementation echoes the first `parrot_buffer_length` characters of the input.

    # TODO: Replace with relevant packages, env vars.
    Setup:
        Install ``langchain-parrot-link`` and set environment variable ``PARROT_LINK_API_KEY``.

        .. code-block:: bash

            pip install -U langchain-parrot-link
            export PARROT_LINK_API_KEY="your-api-key"

    # TODO: Populate with relevant params.
    Key init args — completion params:
        model: str
            Name of ParrotLink model to use.
        temperature: float
            Sampling temperature.
        max_tokens: Optional[int]
            Max number of tokens to generate.

    # TODO: Populate with relevant params.
    Key init args — client params:
        timeout: Optional[float]
            Timeout for requests.
        max_retries: int
            Max number of retries.
        api_key: Optional[str]
            ParrotLink API key. If not passed in will be read from env var PARROT_LINK_API_KEY.

    See full list of supported init args and their descriptions in the params section.

    # TODO: Replace with relevant init params.
    Instantiate:
        .. code-block:: python

            from langchain_parrot_link import ChatParrotLink

            llm = ChatParrotLink(
                model="...",
                temperature=0,
                max_tokens=None,
                timeout=None,
                max_retries=2,
                # api_key="...",
                # other params...
            )

    Invoke:
        .. code-block:: python

            messages = [
                ("system", "You are a helpful translator. Translate the user sentence to French."),
                ("human", "I love programming."),
            ]
            llm.invoke(messages)

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if token-level streaming isn't supported.
    Stream:
        .. code-block:: python

            for chunk in llm.stream(messages):
                print(chunk.text(), end="")

        .. code-block:: python

            # TODO: Example output.

        .. code-block:: python

            stream = llm.stream(messages)
            full = next(stream)
            for chunk in stream:
                full += chunk
            full

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if native async isn't supported.
    Async:
        .. code-block:: python

            await llm.ainvoke(messages)

            # stream:
            # async for chunk in (await llm.astream(messages))

            # batch:
            # await llm.abatch([messages])

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if .bind_tools() isn't supported.
    Tool calling:
        .. code-block:: python

            from pydantic import BaseModel, Field

            class GetWeather(BaseModel):
                '''Get the current weather in a given location'''

                location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

            class GetPopulation(BaseModel):
                '''Get the current population in a given location'''

                location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

            llm_with_tools = llm.bind_tools([GetWeather, GetPopulation])
            ai_msg = llm_with_tools.invoke("Which city is hotter today and which is bigger: LA or NY?")
            ai_msg.tool_calls

        .. code-block:: python

              # TODO: Example output.

        See ``ChatParrotLink.bind_tools()`` method for more.

    # TODO: Delete if .with_structured_output() isn't supported.
    Structured output:
        .. code-block:: python

            from typing import Optional

            from pydantic import BaseModel, Field

            class Joke(BaseModel):
                '''Joke to tell user.'''

                setup: str = Field(description="The setup of the joke")
                punchline: str = Field(description="The punchline to the joke")
                rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10")

            structured_llm = llm.with_structured_output(Joke)
            structured_llm.invoke("Tell me a joke about cats")

        .. code-block:: python

            # TODO: Example output.

        See ``ChatParrotLink.with_structured_output()`` for more.

    # TODO: Delete if JSON mode response format isn't supported.
    JSON mode:
        .. code-block:: python

            # TODO: Replace with appropriate bind arg.
            json_llm = llm.bind(response_format={"type": "json_object"})
            ai_msg = json_llm.invoke("Return a JSON object with key 'random_ints' and a value of 10 random ints in [0-99]")
            ai_msg.content

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if image inputs aren't supported.
    Image input:
        .. code-block:: python

            import base64
            import httpx
            from langchain_core.messages import HumanMessage

            image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
            # TODO: Replace with appropriate message content format.
            message = HumanMessage(
                content=[
                    {"type": "text", "text": "describe the weather in this image"},
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
                    },
                ],
            )
            ai_msg = llm.invoke([message])
            ai_msg.content

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if audio inputs aren't supported.
    Audio input:
        .. code-block:: python

            # TODO: Example input

        .. code-block:: python

            # TODO: Example output

    # TODO: Delete if video inputs aren't supported.
    Video input:
        .. code-block:: python

            # TODO: Example input

        .. code-block:: python

            # TODO: Example output

    # TODO: Delete if token usage metadata isn't supported.
    Token usage:
        .. code-block:: python

            ai_msg = llm.invoke(messages)
            ai_msg.usage_metadata

        .. code-block:: python

            {'input_tokens': 28, 'output_tokens': 5, 'total_tokens': 33}

    # TODO: Delete if logprobs aren't supported.
    Logprobs:
        .. code-block:: python

            # TODO: Replace with appropriate bind arg.
            logprobs_llm = llm.bind(logprobs=True)
            ai_msg = logprobs_llm.invoke(messages)
            ai_msg.response_metadata["logprobs"]

        .. code-block:: python

              # TODO: Example output.

    Response metadata
        .. code-block:: python

            ai_msg = llm.invoke(messages)
            ai_msg.response_metadata

        .. code-block:: python

             # TODO: Example output.

    """  # noqa: E501

    model_name: str = Field(alias="model")
    """The name of the model"""
    parrot_buffer_length: int
    """The number of characters from the last message of the prompt to be echoed."""
    temperature: Optional[float] = None
    max_tokens: Optional[int] = None
    timeout: Optional[int] = None
    stop: Optional[List[str]] = None
    max_retries: int = 2

    @property
    def _llm_type(self) -> str:
        """Return type of chat model."""
        return "chat-__package_name_short__"

    @property
    def _identifying_params(self) -> Dict[str, Any]:
        """Return a dictionary of identifying parameters.

        This information is used by the LangChain callback system, which
        is used for tracing purposes make it possible to monitor LLMs.
        """
        return {
            # The model name allows users to specify custom token counting
            # rules in LLM monitoring applications (e.g., in LangSmith users
            # can provide per token pricing for their model and monitor
            # costs for the given LLM.)
            "model_name": self.model_name,
        }

    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Override the _generate method to implement the chat model logic.

        This can be a call to an API, a call to a local model, or any other
        implementation that generates a response to the input prompt.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
        """
        # Replace this with actual logic to generate a response from a list
        # of messages.
        last_message = messages[-1]
        tokens = last_message.content[: self.parrot_buffer_length]
        ct_input_tokens = sum(len(message.content) for message in messages)
        ct_output_tokens = len(tokens)
        message = AIMessage(
            content=tokens,
            additional_kwargs={},  # Used to add additional payload to the message
            response_metadata={  # Use for response metadata
                "time_in_seconds": 3,
            },
            usage_metadata={
                "input_tokens": ct_input_tokens,
                "output_tokens": ct_output_tokens,
                "total_tokens": ct_input_tokens + ct_output_tokens,
            },
        )
        ##

        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])

    def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        """Stream the output of the model.

        This method should be implemented if the model can generate output
        in a streaming fashion. If the model does not support streaming,
        do not implement it. In that case streaming requests will be automatically
        handled by the _generate method.

        Args:
            messages: the prompt composed of a list of messages.
            stop: a list of strings on which the model should stop generating.
                  If generation stops due to a stop token, the stop token itself
                  SHOULD BE INCLUDED as part of the output. This is not enforced
                  across models right now, but it's a good practice to follow since
                  it makes it much easier to parse the output of the model
                  downstream and understand why generation stopped.
            run_manager: A run manager with callbacks for the LLM.
        """
        last_message = messages[-1]
        tokens = str(last_message.content[: self.parrot_buffer_length])
        ct_input_tokens = sum(len(message.content) for message in messages)

        for token in tokens:
            usage_metadata = UsageMetadata(
                {
                    "input_tokens": ct_input_tokens,
                    "output_tokens": 1,
                    "total_tokens": ct_input_tokens + 1,
                }
            )
            ct_input_tokens = 0
            chunk = ChatGenerationChunk(
                message=AIMessageChunk(content=token, usage_metadata=usage_metadata)
            )

            if run_manager:
                # This is optional in newer versions of LangChain
                # The on_llm_new_token will be called automatically
                run_manager.on_llm_new_token(token, chunk=chunk)

            yield chunk

        # Let's add some other information (e.g., response metadata)
        chunk = ChatGenerationChunk(
            message=AIMessageChunk(content="", response_metadata={"time_in_sec": 3})
        )
        if run_manager:
            # This is optional in newer versions of LangChain
            # The on_llm_new_token will be called automatically
            run_manager.on_llm_new_token(token, chunk=chunk)
        yield chunk

    # TODO: Implement if ChatParrotLink supports async streaming. Otherwise delete.
    # async def _astream(
    #     self,
    #     messages: List[BaseMessage],
    #     stop: Optional[List[str]] = None,
    #     run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
    #     **kwargs: Any,
    # ) -> AsyncIterator[ChatGenerationChunk]:

    # TODO: Implement if ChatParrotLink supports async generation. Otherwise delete.
    # async def _agenerate(
    #     self,
    #     messages: List[BaseMessage],
    #     stop: Optional[List[str]] = None,
    #     run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
    #     **kwargs: Any,
    # ) -> ChatResult:

您的向量儲存實作將取決於您選擇的資料庫技術。langchain-core 包含一個最小的記憶體內向量儲存，我們可以將其作為指南。您可以在此處存取程式碼。

所有向量儲存都必須繼承自 VectorStore 基底類別。此介面包含用於在向量儲存中寫入、刪除和搜尋文件的方法。

VectorStore 支援各種同步和非同步搜尋類型（例如，最近鄰或最大邊際相關性），以及用於將文件新增到儲存的介面。請參閱API 參考以取得所有支援的方法。下表列出必要的方法

方法/屬性	描述
`add_documents`	將文件新增到向量儲存。
`delete`	從向量儲存中刪除選定的文件（依 ID）。
`get_by_ids`	從向量儲存中取得選定的文件（依 ID）。
`similarity_search`	取得與查詢最相似的文件。
`embeddings`（屬性）	向量儲存的嵌入物件。
`from_texts`	透過新增文字來實例化向量儲存。

請注意，InMemoryVectorStore 實作了一些選用的搜尋類型，以及用於將物件載入和傾印到檔案的便利方法，但這並非所有實作都必須的。

提示

記憶體內向量儲存已針對 LangChain Github 儲存庫中的標準測試進行測試。

向量儲存程式碼範例

langchain_parrot_link/vectorstores.py
"""ParrotLink vector stores."""

from __future__ import annotations

import uuid
from typing import (
    Any,
    Callable,
    Iterator,
    List,
    Optional,
    Sequence,
    Tuple,
    Type,
    TypeVar,
)

from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
from langchain_core.vectorstores.utils import _cosine_similarity as cosine_similarity

VST = TypeVar("VST", bound=VectorStore)


class ParrotLinkVectorStore(VectorStore):
    # TODO: Replace all TODOs in docstring.
    """ParrotLink vector store integration.

    # TODO: Replace with relevant packages, env vars.
    Setup:
        Install ``langchain-parrot-link`` and set environment variable ``PARROT_LINK_API_KEY``.

        .. code-block:: bash

            pip install -U langchain-parrot-link
            export PARROT_LINK_API_KEY="your-api-key"

    # TODO: Populate with relevant params.
    Key init args — indexing params:
        collection_name: str
            Name of the collection.
        embedding_function: Embeddings
            Embedding function to use.

    # TODO: Populate with relevant params.
    Key init args — client params:
        client: Optional[Client]
            Client to use.
        connection_args: Optional[dict]
            Connection arguments.

    # TODO: Replace with relevant init params.
    Instantiate:
        .. code-block:: python

            from langchain_parrot_link.vectorstores import ParrotLinkVectorStore
            from langchain_openai import OpenAIEmbeddings

            vector_store = ParrotLinkVectorStore(
                collection_name="foo",
                embedding_function=OpenAIEmbeddings(),
                connection_args={"uri": "./foo.db"},
                # other params...
            )

    # TODO: Populate with relevant variables.
    Add Documents:
        .. code-block:: python

            from langchain_core.documents import Document

            document_1 = Document(page_content="foo", metadata={"baz": "bar"})
            document_2 = Document(page_content="thud", metadata={"bar": "baz"})
            document_3 = Document(page_content="i will be deleted :(")

            documents = [document_1, document_2, document_3]
            ids = ["1", "2", "3"]
            vector_store.add_documents(documents=documents, ids=ids)

    # TODO: Populate with relevant variables.
    Delete Documents:
        .. code-block:: python

            vector_store.delete(ids=["3"])

    # TODO: Fill out with relevant variables and example output.
    Search:
        .. code-block:: python

            results = vector_store.similarity_search(query="thud",k=1)
            for doc in results:
                print(f"* {doc.page_content} [{doc.metadata}]")

        .. code-block:: python

            # TODO: Example output

    # TODO: Fill out with relevant variables and example output.
    Search with filter:
        .. code-block:: python

            results = vector_store.similarity_search(query="thud",k=1,filter={"bar": "baz"})
            for doc in results:
                print(f"* {doc.page_content} [{doc.metadata}]")

        .. code-block:: python

            # TODO: Example output

    # TODO: Fill out with relevant variables and example output.
    Search with score:
        .. code-block:: python

            results = vector_store.similarity_search_with_score(query="qux",k=1)
            for doc, score in results:
                print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

        .. code-block:: python

            # TODO: Example output

    # TODO: Fill out with relevant variables and example output.
    Async:
        .. code-block:: python

            # add documents
            # await vector_store.aadd_documents(documents=documents, ids=ids)

            # delete documents
            # await vector_store.adelete(ids=["3"])

            # search
            # results = vector_store.asimilarity_search(query="thud",k=1)

            # search with score
            results = await vector_store.asimilarity_search_with_score(query="qux",k=1)
            for doc,score in results:
                print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

        .. code-block:: python

            # TODO: Example output

    # TODO: Fill out with relevant variables and example output.
    Use as Retriever:
        .. code-block:: python

            retriever = vector_store.as_retriever(
                search_type="mmr",
                search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
            )
            retriever.invoke("thud")

        .. code-block:: python

            # TODO: Example output

    """  # noqa: E501

    def __init__(self, embedding: Embeddings) -> None:
        """Initialize with the given embedding function.

        Args:
            embedding: embedding function to use.
        """
        self._database: dict[str, dict[str, Any]] = {}
        self.embedding = embedding

    @classmethod
    def from_texts(
        cls: Type[ParrotLinkVectorStore],
        texts: List[str],
        embedding: Embeddings,
        metadatas: Optional[List[dict]] = None,
        **kwargs: Any,
    ) -> ParrotLinkVectorStore:
        store = cls(
            embedding=embedding,
        )
        store.add_texts(texts=texts, metadatas=metadatas, **kwargs)
        return store

    # optional: add custom async implementations
    # @classmethod
    # async def afrom_texts(
    #     cls: Type[VST],
    #     texts: List[str],
    #     embedding: Embeddings,
    #     metadatas: Optional[List[dict]] = None,
    #     **kwargs: Any,
    # ) -> VST:
    #     return await asyncio.get_running_loop().run_in_executor(
    #         None, partial(cls.from_texts, **kwargs), texts, embedding, metadatas
    #     )

    @property
    def embeddings(self) -> Embeddings:
        return self.embedding

    def add_documents(
        self,
        documents: List[Document],
        ids: Optional[List[str]] = None,
        **kwargs: Any,
    ) -> List[str]:
        """Add documents to the store."""
        texts = [doc.page_content for doc in documents]
        vectors = self.embedding.embed_documents(texts)

        if ids and len(ids) != len(texts):
            msg = (
                f"ids must be the same length as texts. "
                f"Got {len(ids)} ids and {len(texts)} texts."
            )
            raise ValueError(msg)

        id_iterator: Iterator[Optional[str]] = (
            iter(ids) if ids else iter(doc.id for doc in documents)
        )

        ids_ = []

        for doc, vector in zip(documents, vectors):
            doc_id = next(id_iterator)
            doc_id_ = doc_id if doc_id else str(uuid.uuid4())
            ids_.append(doc_id_)
            self._database[doc_id_] = {
                "id": doc_id_,
                "vector": vector,
                "text": doc.page_content,
                "metadata": doc.metadata,
            }

        return ids_

    # optional: add custom async implementations
    # async def aadd_documents(
    #     self,
    #     documents: List[Document],
    #     ids: Optional[List[str]] = None,
    #     **kwargs: Any,
    # ) -> List[str]:
    #     raise NotImplementedError

    def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> None:
        if ids:
            for _id in ids:
                self._database.pop(_id, None)

    # optional: add custom async implementations
    # async def adelete(
    #     self, ids: Optional[List[str]] = None, **kwargs: Any
    # ) -> None:
    #     raise NotImplementedError

    def get_by_ids(self, ids: Sequence[str], /) -> list[Document]:
        """Get documents by their ids.

        Args:
            ids: The ids of the documents to get.

        Returns:
            A list of Document objects.
        """
        documents = []

        for doc_id in ids:
            doc = self._database.get(doc_id)
            if doc:
                documents.append(
                    Document(
                        id=doc["id"],
                        page_content=doc["text"],
                        metadata=doc["metadata"],
                    )
                )
        return documents

    # optional: add custom async implementations
    # async def aget_by_ids(self, ids: Sequence[str], /) -> list[Document]:
    #     raise NotImplementedError

    # NOTE: the below helper method implements similarity search for in-memory
    # storage. It is optional and not a part of the vector store interface.
    def _similarity_search_with_score_by_vector(
        self,
        embedding: List[float],
        k: int = 4,
        filter: Optional[Callable[[Document], bool]] = None,
        **kwargs: Any,
    ) -> List[tuple[Document, float, List[float]]]:
        # get all docs with fixed order in list
        docs = list(self._database.values())

        if filter is not None:
            docs = [
                doc
                for doc in docs
                if filter(Document(page_content=doc["text"], metadata=doc["metadata"]))
            ]

        if not docs:
            return []

        similarity = cosine_similarity([embedding], [doc["vector"] for doc in docs])[0]

        # get the indices ordered by similarity score
        top_k_idx = similarity.argsort()[::-1][:k]

        return [
            (
                # Document
                Document(
                    id=doc_dict["id"],
                    page_content=doc_dict["text"],
                    metadata=doc_dict["metadata"],
                ),
                # Score
                float(similarity[idx].item()),
                # Embedding vector
                doc_dict["vector"],
            )
            for idx in top_k_idx
            # Assign using walrus operator to avoid multiple lookups
            if (doc_dict := docs[idx])
        ]

    def similarity_search(
        self, query: str, k: int = 4, **kwargs: Any
    ) -> List[Document]:
        embedding = self.embedding.embed_query(query)
        return [
            doc
            for doc, _, _ in self._similarity_search_with_score_by_vector(
                embedding=embedding, k=k, **kwargs
            )
        ]

    # optional: add custom async implementations
    # async def asimilarity_search(
    #     self, query: str, k: int = 4, **kwargs: Any
    # ) -> List[Document]:
    #     # This is a temporary workaround to make the similarity search
    #     # asynchronous. The proper solution is to make the similarity search
    #     # asynchronous in the vector store implementations.
    #     func = partial(self.similarity_search, query, k=k, **kwargs)
    #     return await asyncio.get_event_loop().run_in_executor(None, func)

    def similarity_search_with_score(
        self, query: str, k: int = 4, **kwargs: Any
    ) -> List[Tuple[Document, float]]:
        embedding = self.embedding.embed_query(query)
        return [
            (doc, similarity)
            for doc, similarity, _ in self._similarity_search_with_score_by_vector(
                embedding=embedding, k=k, **kwargs
            )
        ]

    # optional: add custom async implementations
    # async def asimilarity_search_with_score(
    #     self, *args: Any, **kwargs: Any
    # ) -> List[Tuple[Document, float]]:
    #     # This is a temporary workaround to make the similarity search
    #     # asynchronous. The proper solution is to make the similarity search
    #     # asynchronous in the vector store implementations.
    #     func = partial(self.similarity_search_with_score, *args, **kwargs)
    #     return await asyncio.get_event_loop().run_in_executor(None, func)

    ### ADDITIONAL OPTIONAL SEARCH METHODS BELOW ###

    # def similarity_search_by_vector(
    #     self, embedding: List[float], k: int = 4, **kwargs: Any
    # ) -> List[Document]:
    #     raise NotImplementedError

    # optional: add custom async implementations
    # async def asimilarity_search_by_vector(
    #     self, embedding: List[float], k: int = 4, **kwargs: Any
    # ) -> List[Document]:
    #     # This is a temporary workaround to make the similarity search
    #     # asynchronous. The proper solution is to make the similarity search
    #     # asynchronous in the vector store implementations.
    #     func = partial(self.similarity_search_by_vector, embedding, k=k, **kwargs)
    #     return await asyncio.get_event_loop().run_in_executor(None, func)

    # def max_marginal_relevance_search(
    #     self,
    #     query: str,
    #     k: int = 4,
    #     fetch_k: int = 20,
    #     lambda_mult: float = 0.5,
    #     **kwargs: Any,
    # ) -> List[Document]:
    #     raise NotImplementedError

    # optional: add custom async implementations
    # async def amax_marginal_relevance_search(
    #     self,
    #     query: str,
    #     k: int = 4,
    #     fetch_k: int = 20,
    #     lambda_mult: float = 0.5,
    #     **kwargs: Any,
    # ) -> List[Document]:
    #     # This is a temporary workaround to make the similarity search
    #     # asynchronous. The proper solution is to make the similarity search
    #     # asynchronous in the vector store implementations.
    #     func = partial(
    #         self.max_marginal_relevance_search,
    #         query,
    #         k=k,
    #         fetch_k=fetch_k,
    #         lambda_mult=lambda_mult,
    #         **kwargs,
    #     )
    #     return await asyncio.get_event_loop().run_in_executor(None, func)

    # def max_marginal_relevance_search_by_vector(
    #     self,
    #     embedding: List[float],
    #     k: int = 4,
    #     fetch_k: int = 20,
    #     lambda_mult: float = 0.5,
    #     **kwargs: Any,
    # ) -> List[Document]:
    #     raise NotImplementedError

    # optional: add custom async implementations
    # async def amax_marginal_relevance_search_by_vector(
    #     self,
    #     embedding: List[float],
    #     k: int = 4,
    #     fetch_k: int = 20,
    #     lambda_mult: float = 0.5,
    #     **kwargs: Any,
    # ) -> List[Document]:
    #     raise NotImplementedError

嵌入用於將來自 Document.page_content 欄位的 str 物件轉換為向量表示（表示為浮點數列表）。

請參閱自訂嵌入指南，以取得入門級嵌入實作的詳細資訊。

您可以從以下範本或 langchain-cli 命令開始

langchain-cli integration new \
    --name parrot-link \
    --name-class ParrotLink \
    --src integration_template/embeddings.py \
    --dst langchain_parrot_link/embeddings.py

嵌入程式碼範例

langchain_parrot_link/embeddings.py
from typing import List

from langchain_core.embeddings import Embeddings


class ParrotLinkEmbeddings(Embeddings):
    """ParrotLink embedding model integration.

    # TODO: Replace with relevant packages, env vars.
    Setup:
        Install ``langchain-parrot-link`` and set environment variable
        ``PARROT_LINK_API_KEY``.

        .. code-block:: bash

            pip install -U langchain-parrot-link
            export PARROT_LINK_API_KEY="your-api-key"

    # TODO: Populate with relevant params.
    Key init args — completion params:
        model: str
            Name of ParrotLink model to use.

    See full list of supported init args and their descriptions in the params section.

    # TODO: Replace with relevant init params.
    Instantiate:
        .. code-block:: python

            from langchain_parrot_link import ParrotLinkEmbeddings

            embed = ParrotLinkEmbeddings(
                model="...",
                # api_key="...",
                # other params...
            )

    Embed single text:
        .. code-block:: python

            input_text = "The meaning of life is 42"
            embed.embed_query(input_text)

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if token-level streaming isn't supported.
    Embed multiple text:
        .. code-block:: python

             input_texts = ["Document 1...", "Document 2..."]
            embed.embed_documents(input_texts)

        .. code-block:: python

            # TODO: Example output.

    # TODO: Delete if native async isn't supported.
    Async:
        .. code-block:: python

            await embed.aembed_query(input_text)

            # multiple:
            # await embed.aembed_documents(input_texts)

        .. code-block:: python

            # TODO: Example output.

    """

    def __init__(self, model: str):
        self.model = model

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed search docs."""
        return [[0.5, 0.6, 0.7] for _ in texts]

    def embed_query(self, text: str) -> List[float]:
        """Embed query text."""
        return self.embed_documents([text])[0]

    # optional: add custom async implementations here
    # you can also delete these, and the base class will
    # use the default implementation, which calls the sync
    # version in an async executor:

    # async def aembed_documents(self, texts: List[str]) -> List[List[float]]:
    #     """Asynchronous Embed search docs."""
    #     ...

    # async def aembed_query(self, text: str) -> List[float]:
    #     """Asynchronous Embed query text."""
    #     ...

工具主要以 2 種方式使用

定義「輸入架構」或「args 架構」以傳遞給聊天模型的工具呼叫功能以及文字請求，以便聊天模型可以產生「工具呼叫」或呼叫工具的參數。
取得如上產生的「工具呼叫」，並採取一些動作並傳回可以作為 ToolMessage 傳回給聊天模型的回應。

Tools 類別必須繼承自 BaseTool 基底類別。此介面有 3 個屬性和 2 個方法，應在子類別中實作。

方法/屬性	描述
`name`	工具的名稱（也傳遞給 LLM）。
`description`	工具的描述（也傳遞給 LLM）。
`args_schema`	定義工具輸入引數的架構。
`_run`	使用給定的引數執行工具。
`_arun`	非同步地使用給定的引數執行工具。

屬性

name、description 和 args_schema 都是應在子類別中實作的屬性。name 和 description 是用於識別工具並提供工具功能的描述的字串。這兩者都會傳遞給 LLM，使用者可以根據他們使用的 LLM 覆寫這些值，作為一種「提示工程」。為這些屬性提供簡潔且 LLM 可用的名稱和描述對於工具的初始使用者體驗非常重要。

args_schema 是一個 Pydantic BaseModel，用於定義工具輸入引數的架構。這用於驗證工具的輸入引數，並為 LLM 提供架構以在呼叫工具時填寫。與整體 Tool 類別的 name 和 description 類似，欄位的名稱（變數名稱）和描述（Field(..., description="description") 的一部分）會傳遞給 LLM，並且這些欄位中的值應該簡潔且 LLM 可用。

執行方法

_run 是應在子類別中實作的主要方法。此方法接收來自 args_schema 的引數並執行工具，傳回字串回應。此方法通常在 LangGraph ToolNode 中呼叫，也可以在舊版 langchain.agents.AgentExecutor 中呼叫。

_arun 是選用的，因為預設情況下，_run 將在非同步執行器中執行。但是，如果您的工具正在呼叫任何 API 或執行任何非同步工作，您應該實作此方法以非同步方式執行工具，並同時實作 _run。

實作

您可以從以下範本或 langchain-cli 命令開始

langchain-cli integration new \
    --name parrot-link \
    --name-class ParrotLink \
    --src integration_template/tools.py \
    --dst langchain_parrot_link/tools.py

工具程式碼範例

langchain_parrot_link/tools.py
"""ParrotLink tools."""

from typing import Optional, Type

from langchain_core.callbacks import (
    CallbackManagerForToolRun,
)
from langchain_core.tools import BaseTool
from pydantic import BaseModel, Field


class ParrotLinkToolInput(BaseModel):
    """Input schema for ParrotLink tool.

    This docstring is **not** part of what is sent to the model when performing tool
    calling. The Field default values and descriptions **are** part of what is sent to
    the model when performing tool calling.
    """

    # TODO: Add input args and descriptions.
    a: int = Field(..., description="first number to add")
    b: int = Field(..., description="second number to add")


class ParrotLinkTool(BaseTool):  # type: ignore[override]
    """ParrotLink tool.

    Setup:
        # TODO: Replace with relevant packages, env vars.
        Install ``langchain-parrot-link`` and set environment variable ``PARROT_LINK_API_KEY``.

        .. code-block:: bash

            pip install -U langchain-parrot-link
            export PARROT_LINK_API_KEY="your-api-key"

    Instantiation:
        .. code-block:: python

            tool = ParrotLinkTool(
                # TODO: init params
            )

    Invocation with args:
        .. code-block:: python

            # TODO: invoke args
            tool.invoke({...})

        .. code-block:: python

            # TODO: output of invocation

    Invocation with ToolCall:

        .. code-block:: python

            # TODO: invoke args
            tool.invoke({"args": {...}, "id": "1", "name": tool.name, "type": "tool_call"})

        .. code-block:: python

            # TODO: output of invocation
    """  # noqa: E501

    # TODO: Set tool name and description
    name: str = "TODO: Tool name"
    """The name that is passed to the model when performing tool calling."""
    description: str = "TODO: Tool description."
    """The description that is passed to the model when performing tool calling."""
    args_schema: Type[BaseModel] = ParrotLinkToolInput
    """The schema that is passed to the model when performing tool calling."""

    # TODO: Add any other init params for the tool.
    # param1: Optional[str]
    # """param1 determines foobar"""

    # TODO: Replaced (a, b) with real tool arguments.
    def _run(
        self, a: int, b: int, *, run_manager: Optional[CallbackManagerForToolRun] = None
    ) -> str:
        return str(a + b + 80)

    # TODO: Implement if tool has native async functionality, otherwise delete.

    # async def _arun(
    #     self,
    #     a: int,
    #     b: int,
    #     *,
    #     run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
    # ) -> str:
    #     ...

檢索器用於根據查詢從 API、資料庫或其他來源檢索文件。Retriever 類別必須繼承自 BaseRetriever 基底類別。此介面有 1 個屬性和 2 個方法，應在子類別中實作。

方法/屬性	描述
`k`	要檢索的文件預設數量（可配置）。
`_get_relevant_documents`	根據查詢檢索文件。
`_aget_relevant_documents`	非同步地根據查詢檢索文件。

屬性

k 是一個應在子類別中實作的屬性。此屬性可以簡單地在類別頂端定義，並帶有預設值，例如 k: int = 5。此屬性是要從檢索器檢索的文件預設數量，並且可以由使用者在建構或呼叫檢索器時覆寫。

方法

_get_relevant_documents 是應在子類別中實作的主要方法。

此方法接收查詢並傳回 Document 物件的列表，這些物件具有 2 個主要屬性

page_content - 文件的文字內容
metadata - 關於文件的元資料字典

檢索器通常由使用者直接調用，例如 MyRetriever(k=4).invoke("query")，這將在幕後自動呼叫 _get_relevant_documents。

_aget_relevant_documents 是選用的，因為預設情況下，_get_relevant_documents 將在非同步執行器中執行。但是，如果您的檢索器正在呼叫任何 API 或執行任何非同步工作，您應該實作此方法以非同步方式執行檢索器，並同時實作 _get_relevant_documents，以提高效能。

實作

您可以從以下範本或 langchain-cli 命令開始

langchain-cli integration new \
    --name parrot-link \
    --name-class ParrotLink \
    --src integration_template/retrievers.py \
    --dst langchain_parrot_link/retrievers.py

檢索器程式碼範例

langchain_parrot_link/retrievers.py
"""ParrotLink retrievers."""

from typing import Any, List

from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.retrievers import BaseRetriever


class ParrotLinkRetriever(BaseRetriever):
    # TODO: Replace all TODOs in docstring. See example docstring:
    # https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/retrievers/tavily_search_api.py#L17
    """ParrotLink retriever.

    # TODO: Replace with relevant packages, env vars, etc.
    Setup:
        Install ``langchain-parrot-link`` and set environment variable
        ``PARROT_LINK_API_KEY``.

        .. code-block:: bash

            pip install -U langchain-parrot-link
            export PARROT_LINK_API_KEY="your-api-key"

    # TODO: Populate with relevant params.
    Key init args:
        arg 1: type
            description
        arg 2: type
            description

    # TODO: Replace with relevant init params.
    Instantiate:
        .. code-block:: python

            from langchain-parrot-link import ParrotLinkRetriever

            retriever = ParrotLinkRetriever(
                # ...
            )

    Usage:
        .. code-block:: python

            query = "..."

            retriever.invoke(query)

        .. code-block:: none

            # TODO: Example output.

    Use within a chain:
        .. code-block:: python

            from langchain_core.output_parsers import StrOutputParser
            from langchain_core.prompts import ChatPromptTemplate
            from langchain_core.runnables import RunnablePassthrough
            from langchain_openai import ChatOpenAI

            prompt = ChatPromptTemplate.from_template(
                \"\"\"Answer the question based only on the context provided.

            Context: {context}

            Question: {question}\"\"\"
            )

            llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

            def format_docs(docs):
                return "\\n\\n".join(doc.page_content for doc in docs)

            chain = (
                {"context": retriever | format_docs, "question": RunnablePassthrough()}
                | prompt
                | llm
                | StrOutputParser()
            )

            chain.invoke("...")

        .. code-block:: none

             # TODO: Example output.

    """

    k: int = 3

    # TODO: This method must be implemented to retrieve documents.
    def _get_relevant_documents(
        self, query: str, *, run_manager: CallbackManagerForRetrieverRun, **kwargs: Any
    ) -> List[Document]:
        k = kwargs.get("k", self.k)
        return [
            Document(page_content=f"Result {i} for query: {query}") for i in range(k)
        ]

    # optional: add custom async implementations here
    # async def _aget_relevant_documents(
    #     self,
    #     query: str,
    #     *,
    #     run_manager: AsyncCallbackManagerForRetrieverRun,
    #     **kwargs: Any,
    # ) -> List[Document]: ...

下一步

現在您已實作了您的套件，您可以繼續進行測試您的整合，以測試您的整合並成功執行它們。

（選用）引導啟動新的整合套件​

先決條件​

使用 langchain-cli 引導啟動新的 Python 套件​

先決條件​

使用 Poetry 引導啟動新的 Python 套件​

編寫您的整合​

將您的套件推送到公開的 Github 儲存庫​

實作 LangChain 組件​

屬性​

執行方法​

實作​

屬性​

方法​

實作​

下一步​

此頁面是否有幫助？

（選用）引導啟動新的整合套件

先決條件

使用 langchain-cli 引導啟動新的 Python 套件

先決條件

使用 Poetry 引導啟動新的 Python 套件

編寫您的整合

將您的套件推送到公開的 Github 儲存庫

實作 LangChain 組件

屬性

執行方法

實作

屬性

方法

實作

下一步