如何使用 Google Drive API 识别和删除大型 Google Takeout ZIP 文件？

正文：我遇到一个问题：Google Takeout 不断在我的 Google 云端硬盘中创建大型 ZIP 文件，导致其达到存储限制。我需要使用 Google Drive API 以编程方式识别和删除这些文件。

我注意到文件名遵循类似 takeout-YYYYMMDDTHHMMSSZ-###.zip 的模式，并且我想要：

列出所有文件在我的谷歌云端硬盘中。识别与模式 takeout-*.zip 匹配的文件。删除已识别的文件以释放空间。以下是我到目前为止所做的事情：

我已经设置了一个 Google Cloud 项目并启用了 Google Drive API。我有服务帐户凭据并且可以通过 API 进行身份验证。有人可以提供执行上述步骤的 Python 示例脚本吗？任何有关有效处理大量文件的其他提示也将不胜感激。

我所拥有的： Python 复制代码

from google.oauth2 import service_account
from googleapiclient.discovery import build

# Define the scope
SCOPES = ['https://www.googleapis.com/auth/drive']

# Provide the path to your service account key file
SERVICE_ACCOUNT_FILE = 'path/to/service_account.json'

credentials = service_account.Credentials.from_service_account_file(
        SERVICE_ACCOUNT_FILE, scopes=SCOPES)

# Create the Drive API service
service = build('drive', 'v3', credentials=credentials)

# List all files
results = service.files().list(
    pageSize=1000, fields="nextPageToken, files(id, name, size, modifiedTime)").execute()
items = results.get('files', [])

if not items:
    print('No files found.')
else:
    print('Files:')
    for item in items:
        print(f'{item["name"]} ({item["id"]}) - {item["modifiedTime"]} - {item["size"]}')

# Optionally, delete old files based on a condition
for item in items:
    if 'takeout' in item['name']:
        # Add your condition to delete older files
        service.files().delete(fileId=item['id']).execute()
        print(f'Deleted {item["name"]}')

问题：
如何修改此脚本以有效地识别和仅删除 takeout-*.zip 文件？
处理大量文件时我应该考虑哪些优化或最佳实践？

from google.oauth2 import service_account
from googleapiclient.discovery import build
import re

# Define the scope
SCOPES = ['https://www.googleapis.com/auth/drive']

# Provide the path to your service account key file
SERVICE_ACCOUNT_FILE = 'path/to/service_account.json'

credentials = service_account.Credentials.from_service_account_file(
    SERVICE_ACCOUNT_FILE, scopes=SCOPES)

# Create the Drive API service
service = build('drive', 'v3', credentials=credentials)

def list_and_delete_takeout_files():
    """Lists and deletes Google Takeout files matching the pattern 'takeout-*.zip'.
    """

    page_token = None
    while True:
        # Use pageToken to retrieve large amounts of files
        results = service.files().list(
            pageSize=1000, 
            fields="nextPageToken, files(id, name, size, modifiedTime)",
            pageToken=page_token
        ).execute()
        items = results.get('files', [])

        if not items:
            print('No files found.')
            break

        for item in items:
            # Use regex for more robust pattern matching
            if re.match(r'takeout-.*\.zip$', item['name']):
                print(f'Deleting: {item["name"]} ({item["id"]})')
                try:
                    service.files().delete(fileId=item['id']).execute()
                except Exception as e:
                    print(f'Error deleting file {item["name"]}: {e}')

        page_token = results.get('nextPageToken')
        if not page_token:
            break

if __name__ == '__main__':
    list_and_delete_takeout_files()

改进和注意事项：

正则表达式匹配: 使用 re.match(r'takeout-.*\.zip$', item['name']) 来更准确地匹配文件名模式，确保只删除 takeout-*.zip 文件。
分页: 使用 pageToken 处理可能超过单页限制 (1000) 的大量文件。
错误处理: 包含 try-except 块以捕获删除文件时可能发生的任何错误，防止脚本中断。
删除确认 (可选): 为了安全起见，可以添加一个确认步骤，要求用户在删除每个文件之前进行确认。
条件性删除 (可选): 可以根据修改日期、文件大小或其他条件添加逻辑来进一步筛选要删除的文件。

最佳实践：

增量删除: 如果文件数量非常多，可以考虑分批删除，例如每天删除一部分，以避免 API 调用过于频繁。
日志记录: 记录已删除的文件以及任何错误信息，以便于跟踪和调试。
测试: 在生产环境中运行脚本之前，请务必先在测试环境中对其进行彻底测试。
API 配额: 请注意 Google Drive API 的使用限制，并确保的脚本不会超出这些限制。

通过这些改进，可以更有效、安全地使用 Google Drive API 识别和删除大型 Google Takeout ZIP 文件。

标签：google-chrome,python,zip
From： 78787191

如何使用 Google Drive API 识别和删除大型 Google Takeout ZIP 文件？

相关文章

赞助商

阅读排行