Google Speech-to-Text 音訊轉錄

SpeechToTextLoader 允許使用 Google Cloud Speech-to-Text API 轉錄音訊檔案，並將轉錄文字載入到文件中。

若要使用它，您應該已安裝 google-cloud-speech python 套件，並擁有已啟用 Speech-to-Text API 的 Google Cloud 專案。

將大型模型的力量帶入 Google Cloud 的 Speech API

安裝與設定

首先，您需要安裝 google-cloud-speech python 套件。

您可以在 Speech-to-Text 用戶端程式庫頁面上找到更多相關資訊。

請按照 Google Cloud 文件中的快速入門指南建立專案並啟用 API。

%pip install --upgrade --quiet langchain-google-community[speech]

範例

SpeechToTextLoader 必須包含 project_id 和 file_path 引數。音訊檔案可以指定為 Google Cloud Storage URI (gs://...) 或本機檔案路徑。

載入器僅支援同步請求，每個音訊檔案的限制為 60 秒或 10MB。

from langchain_google_community import SpeechToTextLoader

project_id = "<PROJECT_ID>"
file_path = "gs://cloud-samples-data/speech/audio.flac"
# or a local file path: file_path = "./audio.wav"

loader = SpeechToTextLoader(project_id=project_id, file_path=file_path)

docs = loader.load()

API 參考：SpeechToTextLoader

注意：呼叫 loader.load() 會封鎖直到轉錄完成。

轉錄的文字可在 page_content 中取得

docs[0].page_content

"How old is the Brooklyn Bridge?"

metadata 包含完整的 JSON 回應，其中包含更多元資訊

docs[0].metadata

{
  'language_code': 'en-US',
  'result_end_offset': datetime.timedelta(seconds=1)
}

辨識設定

您可以指定 config 引數來使用不同的語音辨識模型並啟用特定功能。

請參閱Speech-to-Text 辨識器文件和 RecognizeRequest API 參考，以取得如何設定自訂設定的資訊。

如果您未指定 config，將自動選取下列選項

模型：Chirp 通用語音模型
語言：en-US
音訊編碼：自動偵測
自動標點符號：已啟用

from google.cloud.speech_v2 import (
    AutoDetectDecodingConfig,
    RecognitionConfig,
    RecognitionFeatures,
)
from langchain_google_community import SpeechToTextLoader

project_id = "<PROJECT_ID>"
location = "global"
recognizer_id = "<RECOGNIZER_ID>"
file_path = "./audio.wav"

config = RecognitionConfig(
    auto_decoding_config=AutoDetectDecodingConfig(),
    language_codes=["en-US"],
    model="long",
    features=RecognitionFeatures(
        enable_automatic_punctuation=False,
        profanity_filter=True,
        enable_spoken_punctuation=True,
        enable_spoken_emojis=True,
    ),
)

loader = SpeechToTextLoader(
    project_id=project_id,
    location=location,
    recognizer_id=recognizer_id,
    file_path=file_path,
    config=config,
)

API 參考：SpeechToTextLoader

文件載入器概念指南
文件載入器操作指南

安裝與設定​

範例​

辨識設定​

相關內容​

此頁面是否對您有幫助？

安裝與設定

範例

辨識設定

相關內容