跳到主要內容
Open In ColabOpen on GitHub

Writer 文字分割器

此筆記本提供快速概觀,以開始使用 Writer 的文字分割器

Writer 的context-aware splitting endpoint為長文件(最多 4000 個單字)提供智慧型文字分割功能。與簡單的基於字元的分割不同,它保留了區塊之間的語義含義和上下文,使其成為處理長篇內容同時保持連貫性的理想選擇。在 langchain-writer 中,我們提供 Writer 的 context-aware splitting endpoint 作為 LangChain 文字分割器的用法。

概觀

整合詳細資訊

類別套件本地可序列化JS 支援套件下載套件最新版本
WriterTextSplitterlangchain-writerPyPI - DownloadsPyPI - Version

設定

WriterTextSplitterlangchain-writer 套件中可用

%pip install --quiet -U langchain-writer

憑證

註冊 Writer AI Studio 以產生 API 金鑰(您可以按照此快速入門)。然後,設定 WRITER_API_KEY 環境變數

import getpass
import os

if not os.getenv("WRITER_API_KEY"):
os.environ["WRITER_API_KEY"] = getpass.getpass("Enter your Writer API key: ")

設定 LangSmith 以獲得一流的可觀察性也很有幫助(但不是必需的)。如果您希望這樣做,您可以設定 LANGSMITH_TRACINGLANGSMITH_API_KEY 環境變數

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

實例化

使用設定為以下其中一項的 strategy 參數,實例化 WriterTextSplitter 的實例

  • llm_split:使用語言模型進行精確的語義分割
  • fast_split:使用基於啟發式方法進行快速分割
  • hybrid_split:結合兩種方法
from langchain_writer.text_splitter import WriterTextSplitter

splitter = WriterTextSplitter(strategy="fast_split")

用法

WriterTextSplitter 可以同步或非同步使用。

同步用法

若要同步使用 WriterTextSplitter,請使用您要分割的文字呼叫 split_text 方法

text = """Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.

Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
"""

chunks = splitter.split_text(text)
chunks

您可以列印區塊的長度,以查看建立了多少區塊

print(len(chunks))

非同步用法

若要非同步使用 WriterTextSplitter,請使用您要分割的文字呼叫 asplit_text 方法

async_chunks = await splitter.asplit_text(text)
async_chunks

列印區塊的長度,以查看建立了多少區塊

print(len(async_chunks))

API 參考

有關所有 WriterTextSplitter 功能和配置的詳細文件,請前往API 參考

其他資源

您可以在Writer 文件中找到有關 Writer 模型(包括成本、上下文視窗和支援的輸入類型)和工具的資訊。


此頁面是否有幫助?