Titan Takeoff

TitanML 透過我們的訓練、壓縮和推論最佳化平台，協助企業建構和部署更優質、更小、更便宜且更快速的 NLP 模型。

我們的推論伺服器 Titan Takeoff 讓您只需一個指令即可在本地硬體上部署 LLM。大多數生成模型架構都受到支援，例如 Falcon、Llama 2、GPT2、T5 等等。如果您在使用特定模型時遇到問題，請透過 hello@titanml.co 告知我們。

範例用法

以下是一些有用的範例，可協助您開始使用 Titan Takeoff Server。在執行這些指令之前，您需要確保 Takeoff Server 已在背景啟動。如需更多資訊，請參閱啟動 Takeoff 的文件頁面。

import time

# Note importing TitanTakeoffPro instead of TitanTakeoff will work as well both use same object under the hood
from langchain_community.llms import TitanTakeoff
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

API 參考文檔：TitanTakeoff | CallbackManager | StreamingStdOutCallbackHandler | PromptTemplate

範例 1

基本用法，假設 Takeoff 在您的機器上使用預設端口 (即 localhost:3000) 運行。

llm = TitanTakeoff()
output = llm.invoke("What is the weather in London in August?")
print(output)

範例 2

指定端口和其他生成參數

llm = TitanTakeoff(port=3000)
# A comprehensive list of parameters can be found at https://docs.titanml.co/docs/next/apis/Takeoff%20inference_REST_API/generate#request
output = llm.invoke(
    "What is the largest rainforest in the world?",
    consumer_group="primary",
    min_new_tokens=128,
    max_new_tokens=512,
    no_repeat_ngram_size=2,
    sampling_topk=1,
    sampling_topp=1.0,
    sampling_temperature=1.0,
    repetition_penalty=1.0,
    regex_string="",
    json_schema=None,
)
print(output)

範例 3

使用 generate 處理多個輸入

llm = TitanTakeoff()
rich_output = llm.generate(["What is Deep Learning?", "What is Machine Learning?"])
print(rich_output.generations)

範例 4

串流輸出

llm = TitanTakeoff(
    streaming=True, callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
prompt = "What is the capital of France?"
output = llm.invoke(prompt)
print(output)

範例 5

使用 LCEL

llm = TitanTakeoff()
prompt = PromptTemplate.from_template("Tell me about {topic}")
chain = prompt | llm
output = chain.invoke({"topic": "the universe"})
print(output)

範例 6

使用 TitanTakeoff Python Wrapper 啟動讀取器。如果您在首次啟動 Takeoff 時尚未建立任何讀取器，或者想要新增另一個讀取器，您可以在初始化 TitanTakeoff 物件時執行此操作。只需將您要啟動的模型配置列表作為 models 參數傳遞即可。

# Model config for the llama model, where you can specify the following parameters:
#   model_name (str): The name of the model to use
#   device: (str): The device to use for inference, cuda or cpu
#   consumer_group (str): The consumer group to place the reader into
#   tensor_parallel (Optional[int]): The number of gpus you would like your model to be split across
#   max_seq_length (int): The maximum sequence length to use for inference, defaults to 512
#   max_batch_size (int_: The max batch size for continuous batching of requests
llama_model = {
    "model_name": "TheBloke/Llama-2-7b-Chat-AWQ",
    "device": "cuda",
    "consumer_group": "llama",
}
llm = TitanTakeoff(models=[llama_model])

# The model needs time to spin up, length of time need will depend on the size of model and your network connection speed
time.sleep(60)

prompt = "What is the capital of France?"
output = llm.invoke(prompt, consumer_group="llama")
print(output)

LLM 概念指南
LLM 操作指南

範例用法​

範例 1​

範例 2​

範例 3​

範例 4​

範例 5​

範例 6​

相關​

此頁面是否對您有幫助？