使用Python清理重复音乐文件：一个简单的解决方案

标签：音乐文件 title Python 解决方案 length music file path

在日常生活中，我们经常会从各种渠道获取音乐资源，例如购买、下载或者从朋友那里借来。然而，有时候我们可能会发现自己的音乐库里存在着大量的重复音乐文件，这不仅浪费了存储空间，而且在听歌的时候也会带来不便。

针对这个问题，我编写了一个简单的Python程序来帮助清理重复的音乐文件。为什么选择Python呢？因为Python是一种功能强大且易于上手的编程语言，而且我的电脑中已经安装了Python环境，只需要一个VSCode就可以编写、调试和运行代码，非常符合我的需求。

首先，让我们来看一下这个程序的实现原理。程序主要分为两个部分：获取音乐文件的标题信息和遍历目录并删除重复的音乐文件。

from mutagen.mp3 import MP3
from mutagen.flac import FLAC
import os

def get_music_info(file_path):
    """获取音乐文件的标题信息"""
    # 获取文件扩展名
    _, ext = os.path.splitext(file_path)
    ext = ext.lower()

    if ext == ".mp3":
        try:
            audio = MP3(file_path)
            if "TIT2" in audio.tags:
                # 获取mp3文件的标题信息
                return audio.tags["TIT2"].text[0], audio.info.length
            else:
                return "Unknown Title", 0
        except Exception as e:
            print(f"Error reading mp3 file: {e}")
            return "Unknown Title", 0
    elif ext == ".flac":
        try:
            audio = FLAC(file_path)
            if "title" in audio:
                # 获取flac文件的标题信息
                return audio["title"][0], audio.info.length
            else:
                return "Unknown Title", 0
        except Exception as e:
            print(f"Error reading flac file: {e}")
            return "Unknown Title", 0
    else:
        print(f"Error reading flac file {ext}")
        return "Unknown Title", 0

def remove_duplicate_music(root_dir):
    """遍历目录并删除重复的音乐文件"""
    seen_titles = {}
    duplicate_music_count = 0
    for dirpath, _, filenames in os.walk(root_dir):
        for filename in filenames:
            file_path = os.path.join(dirpath, filename)
            if file_path.lower().endswith(('.mp3', '.flac')):
                music_title, music_info_length = get_music_info(file_path)
                if music_title in seen_titles:
                    # 比较两个文件信息的完整性,保留信息更全的文件
                    existing_title, existing_length, exsiting_path = seen_titles[music_title]
                    if music_info_length > existing_length:
                        # 删除已存在的文件
                        os.remove(exsiting_path)
                        print(f"已删除重复文件: {exsiting_path}")
                        seen_titles[music_title] = (music_title, music_info_length, file_path)
                    else:
                        # 删除当前的文件
                        os.remove(file_path)
                        duplicate_music_count += 1
                        print(f"已删除的重复文件: {file_path}")
                else:
                    seen_titles[music_title] = (music_title, music_info_length, file_path)
    print(f"总计重复的歌曲数目: {duplicate_music_count}")

root_directory = "E:\\BaiduNetdiskDownload\\Music\\001 每月抖音热歌"
remove_duplicate_music(root_directory)

以上是程序的代码实现，接下来我将简要解释一下其运行原理。

首先，程序通过使用mutagen库来解析音乐文件的元数据，从而获取音乐文件的标题信息和时长。然后，通过遍历指定目录下的所有音乐文件，将文件的标题信息和时长存储到一个字典中。在遍历的过程中，如果发现有重复的标题信息，就会进行比较，并保留信息更全的文件，删除其他重复文件。

最后，程序会输出总计删除的重复歌曲数目，以及每个被删除的重复文件的路径。

通过这个简单的Python程序，我成功地清理了我的音乐库中的重复音乐文件，节省了大量的存储空间，也让我在听歌的时候不再被重复的曲目打扰。希望这个小工具也能帮助到有类似需求的朋友们。

平时工作中主要使用c++、c#做开发，同样的功能使用c#引入TagLib来做也是非常方便:

using System;
using System.Collections.Generic;
using System.IO;
using TagLib;

class Program
{
    static (string, long) GetMusicInfo(string filePath)
    {
        string title = "Unknown Title";
        long length = 0;
        try
        {
            var file = TagLib.File.Create(filePath);
            title = file.Tag.Title;
            length = file.Properties.Duration.Ticks;
        }
        catch (Exception e)
        {
            Console.WriteLine($"Error reading file: {e.Message}");
        }
        return (title, length);
    }

    static void RemoveDuplicateMusic(string rootDir)
    {
        Dictionary<string, (string, long, string)> seenTitles = new Dictionary<string, (string, long, string)>();
        int duplicateMusicCount = 0;
        foreach (string filePath in Directory.GetFiles(rootDir, "*.*", SearchOption.AllDirectories))
        {
            string ext = Path.GetExtension(filePath).ToLower();
            if (ext == ".mp3" || ext == ".flac")
            {
                (string title, long length) = GetMusicInfo(filePath);
                if (seenTitles.ContainsKey(title))
                {
                    // 比较两个文件信息的完整性，保留信息更全的文件
                    var (existingTitle, existingLength, existingPath) = seenTitles[title];
                    if (length > existingLength)
                    {
                        // 删除已存在的文件
                        File.Delete(existingPath);
                        Console.WriteLine($"已删除重复文件: {existingPath}");
                        seenTitles[title] = (title, length, filePath);
                    }
                    else
                    {
                        // 删除当前的文件
                        File.Delete(filePath);
                        duplicateMusicCount++;
                        Console.WriteLine($"已删除的重复文件: {filePath}");
                    }
                }
                else
                {
                    seenTitles[title] = (title, length, filePath);
                }
            }
        }
        Console.WriteLine($"总计重复的歌曲数目: {duplicateMusicCount}");
    }

    static void Main(string[] args)
    {
        string rootDirectory = @"E:\BaiduNetdiskDownload\Music\001 每月抖音热歌";
        RemoveDuplicateMusic(rootDirectory);
    }
}

以上的c#代码片段的代码我没有调试，不保证是否可用主要是记录一个思路，另外涉及到整理资源包中的资源可能也会涉及到列出一些质量不太好的音乐出来，要验证音乐文件的质量，一种常见的方法是检查其音频属性，如比特率、采样率和编码方式。较低的比特率通常会导致音频质量较差。另外，还可以通过音频分析工具来检查音频的频谱和波形，以判断其质量。

以下是一种方法，使用Python和FFmpeg库来扫描音乐文件并获取其音频属性，以评估音乐质量：

import os
import subprocess
import json

def get_audio_properties(file_path):
    """获取音乐文件的属性"""
    cmd = ["ffprobe", "-v", "quiet", "-print_format", "json", "-show_format", file_path]
    result = subprocess.run(cmd, capture_output = True)
    if result.returncode == 0:
        properties = json.loads(result.stdout)
        return properties.get("format", {})
    else:
        print(f"Error reading file: {file_path}")
        return {}
    
def scan_music_quality(root_dir, min_bitrate = 192000):
    """扫描音乐文件并列出质量差的文件"""
    low_quality_files = []
    for dirpath, _, filenames in os.walk(root_dir):
        for filename in filenames:
            file_path = os.path.join(dirpath, filename)
            if file_path.lower().endswith(('.mp3', '.flac')):
                properties = get_audio_properties(file_path)
                if properties:
                    bitrate = int(properties.get("bit_rate", 0))
                    if bitrate < min_bitrate:
                        low_quality_files.append((file_path, bitrate))
    return low_quality_files

#使用示例
root_directory = "E:\\BaiduNetdiskDownload\\Music\\001 每月抖音热歌"
low_quality_files = scan_music_quality(root_directory)
for file_path, bitrate in low_quality_files:
    print(f"Low quality file: {file_path}, Bitrate: {bitrate}")

标签：音乐文件,title,Python,解决方案,length,music,file,path
From： https://www.cnblogs.com/linxmouse/p/18106243

使用Python清理重复音乐文件：一个简单的解决方案

相关文章

赞助商

阅读排行