有什么方法可以带来非英语的 YouTube 评论

我目前正在开发一个代码，通过 Google YT API 从播放列表中的歌曲中获取大量评论。但是，该代码似乎不适用于包含以英语以外的语言编写的评论的视频：

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
youtube = build('youtube', 'v3', developerKey='My_API Key')


def fetch_comments(video_id):
    try:
        request=youtube.commentThreads().list(
            part="snippet,replies",
            videoId=video_id,
            maxResults=10,
            order = "relevance"
        ).execute()

        for item in request['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textOriginal']

    except HttpError as e:
        print(f"An error occurred: {e.resp.status} {e.resp.reason}")

def get_playlist_items(youtube, playlist_id, max_results=15):
    # List to hold all the video URLs
    video_urls = []

    # Make the API call to fetch playlist items
    request = youtube.playlistItems().list(
        part='snippet',
        playlistId=playlist_id,
        maxResults=max_results
    )
    response = request.execute()

    # Extract video IDs from the response and generate URLs
    for item in response.get('items', []):
        video_id = item['snippet']['resourceId']['videoId']
        video_url = f"https://www.youtube.com/watch?v={video_id}"
        video_urls.append(video_url)

    return video_urls


f = open("playlist.txt", 'r')

lines = f.readlines()
for line in lines:
    line = line.strip()
    URL = line.split(', ')[0]
    number = int(line.split(', ')[1])
    playlist_id = URL.split('=')[-1]
    video_urls = get_playlist_items(youtube, playlist_id)
    print(len(video_urls))
    for url in video_urls :
        print(url)
        video_id = url.split('=')[-1]
        fetch_comments(video_id)

f.close()

错误消息的示例如下所示：

{
  "error": {
    "code": 403,
    "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.",
    "errors": [
      {
        "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.",
        "domain": "youtube.commentThread",
        "reason": "commentsDisabled",
        "location": "videoId",
        "locationType": "parameter"
      }
    ]
  }
}

最初，此错误消息适用于以下情况：视频实际上已禁用评论或线程，因此代码不会显示任何内容：

except HttpError as e:
        print(f"An error occurred: {e.resp.status} {e.resp.reason}")

但是，如果视频中包含以英语以外的语言编写的评论，则对于尚未禁用评论的视频，仍会显示错误消息。|| |其次，当我打印从播放列表中检索的歌曲数量时，每次试验都与 max_result 变量存在微小差异。我原本期望检索 15 首歌曲，但在某些情况下，程序会在 14 首甚至更少时停止。（并不是因为播放列表的歌曲编号少于 max_result 值。）

我尝试将字符包含在 unicode 中，但它不起作用：

我还尝试在处理评论之前翻译评论:

print(comment.encode('utf-8').decode('utf-8'))

然而，结果并不好，我完全不知道第二个问题。

translated_comment = translator.translate(comment, dest=target_language).text
print(translated_comment)

However, this didn't turn out well as well, and I have completely no idea about the second issue.

关于 YouTube API 的代码中出现错误和不一致的问题，原因有多种：

评论已停用： 正如错误消息所述，某些视频可能已明确停用评论。这与语言无关，而是一个视频级别的设置。
限制速率： YouTube API 具有配额和限制速率。如果你连续发出太多请求，可能会遇到错误或不完整的响应。为了缓解这种情况，请在请求之间实现指数退避（exponential backoff）。
播放列表分页： playlistItems().list 端点使用分页。单个请求可能无法返回播放列表中的所有视频，尤其是在视频数量超过 maxResults 参数时。要检索所有视频，你需要迭代 nextPageToken ，直到没有更多页面可用。
Unicode 编码问题： 虽然你提到了尝试使用 UTF-8 进行编码/解码，但确保整个处理管道（包括源代码文件、终端输出和任何用于存储或显示数据的外部库）都使用一致的 Unicode 编码非常重要。

修改后的代码：

from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
import time

youtube = build('youtube', 'v3', developerKey='YOUR_API_KEY')

def fetch_comments(video_id):
    try:
        request = youtube.commentThreads().list(
            part="snippet,replies",
            videoId=video_id,
            maxResults=100,  # Increase maxResults for more comments per request
            order="relevance"
        ).execute()

        for item in request['items']:
            comment = item['snippet']['topLevelComment']['snippet']['textOriginal']
            print(f"Comment: {comment}")

    except HttpError as e:
        if e.resp.status == 403 and 'commentsDisabled' in str(e.content):
            print(f"Comments are disabled for video ID: {video_id}")
        else:
            print(f"An error occurred: {e.resp.status} {e.resp.reason}")

def get_playlist_items(youtube, playlist_id, max_results=15):
    video_ids = []
    next_page_token = None

    while True:
        request = youtube.playlistItems().list(
            part='snippet',
            playlistId=playlist_id,
            maxResults=min(max_results, 50),  # Use smaller batches to avoid rate limiting
            pageToken=next_page_token
        )
        response = request.execute()

        for item in response.get('items', []):
            video_ids.append(item['snippet']['resourceId']['videoId'])

        next_page_token = response.get('nextPageToken')
        if not next_page_token:
            break

        time.sleep(1)  # Introduce a delay to avoid hitting rate limits

    return video_ids

# ... rest of your code ...

说明：

错误处理： 改进了 fetch_comments 中的错误处理，以便区分“评论已停用”错误和其他 HTTP 错误。
速率限制： 在 get_playlist_items 中添加了一个 time.sleep(1) 调用，并在请求之间引入了一个 1 秒的延迟，以缓解速率限制问题。
播放列表分页： 使用 nextPageToken 在 get_playlist_items 函数中实现了分页，以确保检索所有视频 ID。
批量处理： 将 maxResults 调整为较小的值（例如，50），以减少单个请求上的负载并降低遇到速率限制的可能性。

其他建议：

指数退避： 考虑在遇到速率限制错误时实现更强大的错误处理机制，例如指数退避。
Unicode 编码： 仔细检查你的环境和代码库，以确保在整个处理流程中始终使用 UTF-8 或其他合适的 Unicode 编码。
API 配额： 监控你的 API 使用情况并确保你没有超过每日或每分钟的配额。

通过解决这些问题，你应该能够更可靠地检索评论，包括非英语评论，并减少遇到的不一致之处。

标签：python,api,youtube,comments
From： 78837589

有什么方法可以带来非英语的 YouTube 评论

相关文章

赞助商

阅读排行