Writer 文字分割器
此筆記本提供快速概觀,以開始使用 Writer 的文字分割器。
Writer 的context-aware splitting endpoint為長文件(最多 4000 個單字)提供智慧型文字分割功能。與簡單的基於字元的分割不同,它保留了區塊之間的語義含義和上下文,使其成為處理長篇內容同時保持連貫性的理想選擇。在 langchain-writer
中,我們提供 Writer 的 context-aware splitting endpoint 作為 LangChain 文字分割器的用法。
概觀
整合詳細資訊
類別 | 套件 | 本地 | 可序列化 | JS 支援 | 套件下載 | 套件最新版本 |
---|---|---|---|---|---|---|
WriterTextSplitter | langchain-writer | ❌ | ❌ | ❌ |
設定
WriterTextSplitter
在 langchain-writer
套件中可用
%pip install --quiet -U langchain-writer
憑證
註冊 Writer AI Studio 以產生 API 金鑰(您可以按照此快速入門)。然後,設定 WRITER_API_KEY 環境變數
import getpass
import os
if not os.getenv("WRITER_API_KEY"):
os.environ["WRITER_API_KEY"] = getpass.getpass("Enter your Writer API key: ")
設定 LangSmith 以獲得一流的可觀察性也很有幫助(但不是必需的)。如果您希望這樣做,您可以設定 LANGSMITH_TRACING
和 LANGSMITH_API_KEY
環境變數
# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()
實例化
使用設定為以下其中一項的 strategy
參數,實例化 WriterTextSplitter
的實例
llm_split
:使用語言模型進行精確的語義分割fast_split
:使用基於啟發式方法進行快速分割hybrid_split
:結合兩種方法
from langchain_writer.text_splitter import WriterTextSplitter
splitter = WriterTextSplitter(strategy="fast_split")
用法
WriterTextSplitter
可以同步或非同步使用。
同步用法
若要同步使用 WriterTextSplitter
,請使用您要分割的文字呼叫 split_text
方法
text = """Reeeeeeeeeeeeeeeeeeeeeaally long text you want to divide into smaller chunks. For example you can add a poem multiple times:
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,
And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.
I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and I—
I took the one less traveled by,
And that has made all the difference.
"""
chunks = splitter.split_text(text)
chunks
您可以列印區塊的長度,以查看建立了多少區塊
print(len(chunks))
非同步用法
若要非同步使用 WriterTextSplitter
,請使用您要分割的文字呼叫 asplit_text
方法
async_chunks = await splitter.asplit_text(text)
async_chunks
列印區塊的長度,以查看建立了多少區塊
print(len(async_chunks))
API 參考
有關所有 WriterTextSplitter
功能和配置的詳細文件,請前往API 參考。
其他資源
您可以在Writer 文件中找到有關 Writer 模型(包括成本、上下文視窗和支援的輸入類型)和工具的資訊。