跳到主要內容
Open In ColabOpen on GitHub

Facebook Messenger

本筆記本展示如何從 Facebook 載入資料,以便您進行微調。整體步驟如下:

  1. 將您的 Messenger 資料下載到磁碟。
  2. 建立聊天載入器並呼叫 loader.load() (或 loader.lazy_load()) 以執行轉換。
  3. 選擇性地使用 merge_chat_runs 來合併來自同一發送者的連續訊息,和/或 map_ai_messages 將來自指定發送者的訊息轉換為「AIMessage」類別。完成此操作後,呼叫 convert_messages_for_finetuning 以準備用於微調的資料。

完成此操作後,您就可以微調模型。若要執行此操作,您需要完成以下步驟:

  1. 將您的訊息上傳到 OpenAI 並執行微調作業。
  2. 在您的 LangChain 應用程式中使用產生的模型!

讓我們開始吧。

1. 下載資料

若要下載您自己的 Messenger 資料,請依照 此處 的指示操作。重要 - 請務必以 JSON 格式 (而非 HTML) 下載。

我們在 此 Google Drive 連結 上託管了一個範例傾印,我們將在本逐步解說中使用它。

# This uses some example data
import zipfile

import requests


def download_and_unzip(url: str, output_path: str = "file.zip") -> None:
file_id = url.split("/")[-2]
download_url = f"https://drive.google.com/uc?export=download&id={file_id}"

response = requests.get(download_url)
if response.status_code != 200:
print("Failed to download the file.")
return

with open(output_path, "wb") as file:
file.write(response.content)
print(f"File {output_path} downloaded.")

with zipfile.ZipFile(output_path, "r") as zip_ref:
zip_ref.extractall()
print(f"File {output_path} has been unzipped.")


# URL of the file to download
url = (
"https://drive.google.com/file/d/1rh1s1o2i7B-Sk1v9o8KNgivLVGwJ-osV/view?usp=sharing"
)

# Download and unzip
download_and_unzip(url)
File file.zip downloaded.
File file.zip has been unzipped.

2. 建立聊天載入器

我們有 2 種不同的 FacebookMessengerChatLoader 類別,一種用於整個聊天目錄,另一種用於載入個別檔案。我們

directory_path = "./hogwarts"
from langchain_community.chat_loaders.facebook_messenger import (
FolderFacebookMessengerChatLoader,
SingleFileFacebookMessengerChatLoader,
)
loader = SingleFileFacebookMessengerChatLoader(
path="./hogwarts/inbox/HermioneGranger/messages_Hermione_Granger.json",
)
chat_session = loader.load()[0]
chat_session["messages"][:3]
[HumanMessage(content="Hi Hermione! How's your summer going so far?", additional_kwargs={'sender': 'Harry Potter'}),
HumanMessage(content="Harry! Lovely to hear from you. My summer is going well, though I do miss everyone. I'm spending most of my time going through my books and researching fascinating new topics. How about you?", additional_kwargs={'sender': 'Hermione Granger'}),
HumanMessage(content="I miss you all too. The Dursleys are being their usual unpleasant selves but I'm getting by. At least I can practice some spells in my room without them knowing. Let me know if you find anything good in your researching!", additional_kwargs={'sender': 'Harry Potter'})]
loader = FolderFacebookMessengerChatLoader(
path="./hogwarts",
)
chat_sessions = loader.load()
len(chat_sessions)
9

3. 準備微調

呼叫 load() 會傳回我們可以擷取為人類訊息的所有聊天訊息。與聊天機器人交談時,相較於真實對話,對話通常遵循更嚴格的交替對話模式。

您可以選擇合併訊息「執行」(來自同一發送者的連續訊息),並選取一個發送者來代表「AI」。微調後的 LLM 將學習產生這些 AI 訊息。

from langchain_community.chat_loaders.utils import (
map_ai_messages,
merge_chat_runs,
)
merged_sessions = merge_chat_runs(chat_sessions)
alternating_sessions = list(map_ai_messages(merged_sessions, "Harry Potter"))
# Now all of Harry Potter's messages will take the AI message class
# which maps to the 'assistant' role in OpenAI's training format
alternating_sessions[0]["messages"][:3]
[AIMessage(content="Professor Snape, I was hoping I could speak with you for a moment about something that's been concerning me lately.", additional_kwargs={'sender': 'Harry Potter'}),
HumanMessage(content="What is it, Potter? I'm quite busy at the moment.", additional_kwargs={'sender': 'Severus Snape'}),
AIMessage(content="I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister.", additional_kwargs={'sender': 'Harry Potter'})]

現在我們可以轉換為 OpenAI 格式字典

from langchain_community.adapters.openai import convert_messages_for_finetuning
training_data = convert_messages_for_finetuning(alternating_sessions)
print(f"Prepared {len(training_data)} dialogues for training")
Prepared 9 dialogues for training
training_data[0][:3]
[{'role': 'assistant',
'content': "Professor Snape, I was hoping I could speak with you for a moment about something that's been concerning me lately."},
{'role': 'user',
'content': "What is it, Potter? I'm quite busy at the moment."},
{'role': 'assistant',
'content': "I apologize for the interruption, sir. I'll be brief. I've noticed some strange activity around the school grounds at night. I saw a cloaked figure lurking near the Forbidden Forest last night. I'm worried someone may be plotting something sinister."}]

OpenAI 目前至少需要 10 個訓練範例才能進行微調作業,但他們建議大多數任務使用 50-100 個範例。由於我們只有 9 個聊天會話,我們可以將它們細分 (可選擇性地重疊),以便每個訓練範例都包含整個對話的一部分。

Facebook 聊天會話 (每人 1 個) 通常跨越多天和對話,因此遠程依賴關係對於建模而言可能不是那麼重要。

# Our chat is alternating, we will make each datapoint a group of 8 messages,
# with 2 messages overlapping
chunk_size = 8
overlap = 2

training_examples = [
conversation_messages[i : i + chunk_size]
for conversation_messages in training_data
for i in range(0, len(conversation_messages) - chunk_size + 1, chunk_size - overlap)
]

len(training_examples)
100

4. 微調模型

現在是微調模型的時候了。請確保您已安裝 openai 並已正確設定 OPENAI_API_KEY

%pip install --upgrade --quiet  langchain-openai
import json
import time
from io import BytesIO

import openai

# We will write the jsonl file in memory
my_file = BytesIO()
for m in training_examples:
my_file.write((json.dumps({"messages": m}) + "\n").encode("utf-8"))

my_file.seek(0)
training_file = openai.files.create(file=my_file, purpose="fine-tune")

# OpenAI audits each training file for compliance reasons.
# This make take a few minutes
status = openai.files.retrieve(training_file.id).status
start_time = time.time()
while status != "processed":
print(f"Status=[{status}]... {time.time() - start_time:.2f}s", end="\r", flush=True)
time.sleep(5)
status = openai.files.retrieve(training_file.id).status
print(f"File {training_file.id} ready after {time.time() - start_time:.2f} seconds.")
File file-ULumAXLEFw3vB6bb9uy6DNVC ready after 0.00 seconds.

檔案準備就緒後,就可以開始訓練作業了。

job = openai.fine_tuning.jobs.create(
training_file=training_file.id,
model="gpt-3.5-turbo",
)

在您的模型準備期間,請喝杯茶。這可能需要一些時間!

status = openai.fine_tuning.jobs.retrieve(job.id).status
start_time = time.time()
while status != "succeeded":
print(f"Status=[{status}]... {time.time() - start_time:.2f}s", end="\r", flush=True)
time.sleep(5)
job = openai.fine_tuning.jobs.retrieve(job.id)
status = job.status
Status=[running]... 874.29s. 56.93s
print(job.fine_tuned_model)
ft:gpt-3.5-turbo-0613:personal::8QnAzWMr

5. 在 LangChain 中使用

您可以直接使用產生的模型 ID ChatOpenAI 模型類別。

from langchain_openai import ChatOpenAI

model = ChatOpenAI(
model=job.fine_tuned_model,
temperature=1,
)
API 參考:ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
("human", "{input}"),
]
)

chain = prompt | model | StrOutputParser()
for tok in chain.stream({"input": "What classes are you taking?"}):
print(tok, end="", flush=True)
I'm taking Charms, Defense Against the Dark Arts, Herbology, Potions, Transfiguration, and Ancient Runes. How about you?

此頁面是否對您有幫助?