C++ 函数返回极其缓慢，远慢于功能等效的 python 代码

标签：python c++ algorithm optimization string-matching

我有一个在我编写的脚本中使用的函数，用于从列表中删除多余的阻塞关键字。基本上，输入（以任何顺序）：

{"apple", "bapple", "banana", "cherry", "bananaman", "sweetherrypie", "sweet", "b"}

它应该输出一个缩小的字符串数组（以任何顺序）：

{"apple", "cherry", "sweet", "b"}

这样做是为了减少匹配算法需要传递的关键字数量，如一个个人学习项目（如果一个字符串包含“banana”，它也会包含“b”，所以香蕉是不必要的，并且可以被丢弃

它需要具有极高的性能，因为它必须在超过 10,000,000 个字符串的列表上运行.

最初，我编写了一些 Python 代码来执行此操作：

def deduplicate_keywords(keywords):
    # Remove duplicates and sort the keywords by length in ascending order
    keywords = sorted(set(keywords), key=len)
    unique_keywords = []

    for keyword in keywords:
        not_redundant = True

        for unique_keyword in unique_keywords:
            # Check if the current keyword includes any of the unique keywords
            if unique_keyword in keyword:
                not_redundant = False
                break
        # If the keyword does not include any of the shorter keywords, add it to unique_keywords
        if not_redundant:
            unique_keywords.append(keyword)

    return unique_keywords

以及一个性能基准来衡量其性能：

class TestDeduplicateKeywords(unittest.TestCase):
    def test_performance_intensive_case(self):
        keywords = [
            "abcdef",
            "def",
            "ghidef",
            "jkl",
            "mnop",
            "qrst",
            "uvwxyz",
            "ghijkl",
            "abcdefgh",
            "ijklmnop",
            "mnopqrstuv",
            "rstuvwx",
            "mnopqrstuvwx",
            "uvwx",
        ]

        # Extending the list to make it very long and complex
        base_keywords = keywords[:]
        for i in range(200000):
            base_keywords.extend([f"{k}{i}" for k in keywords])
            base_keywords.extend([f"{i}{k}" for k in keywords])
            base_keywords.extend([f"{i}{k}{i}" for k in keywords])

        keywords = base_keywords

        old_function_time = 0
        new_function_time = 0

        for repeat in range(1): 
            # This loop (not present in the cpp code) is used to loop it multiple times to find an average
            start_time = time.time()
            result = deduplicate_keywords(keywords)
            end_time = time.time()
            old_function_time = (
                repeat * old_function_time + (end_time - start_time)
            ) / (repeat + 1)
            start_time = time.time()
            result = deduplicate_keywords_new(keywords) 
            # Above is a function that can be edited to check whether changes speed up the code
            end_time = time.time()
            new_function_time = (
                repeat * new_function_time + (end_time - start_time)
            ) / (repeat + 1)

        # As the list is extended with numbers, the result will be unique non-redundant keywords
        expected_result = ["def", "jkl", "mnop", "qrst", "uvwx"]

        self.assertEqual(set(result), set(expected_result))
        print(
            f"Test performance_intensive_case for old function took {old_function_time} seconds\n"
            f"Test performance_intensive_case for new function took {new_function_time} seconds\n"
        )

        if old_function_time < new_function_time:
            print(f"Old is faster than new")
        else:
            print(f"New is faster than old")


if __name__ == "__main__":
    unittest.main()

这对于我的需求来说太慢了 12 seconds 在我的弱笔记本电脑上。

因此，我尝试在 cpp 中编写相同的内容，看看是否会获得性能增益如此慢得多，因此我向函数本身添加了一些计时代码，以查看速度减慢的位置：|| |这将输出到终端（缩短，删除当前空的 deduplicate_keywords_new 函数的日志）：

#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <algorithm>
#include <chrono>
#include <cassert>

std::vector<std::string> deduplicate_keywords(const std::vector<std::string>& keywords) {
    auto t_start = std::chrono::high_resolution_clock::now();

    // Remove duplicates using a set
    auto t1 = std::chrono::high_resolution_clock::now();
    std::cout << "started old function!" << std::endl;
    std::set<std::string> unique_set(keywords.begin(), keywords.end());
    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "removed duplicates in " << std::chrono::duration<double>(t2 - t1).count() << " seconds" << std::endl;

    // Convert set back to vector and sort by length
    auto t3 = std::chrono::high_resolution_clock::now();
    std::vector<std::string> sorted_keywords(unique_set.begin(), unique_set.end());
    std::sort(sorted_keywords.begin(), sorted_keywords.end(), [](const std::string& a, const std::string& b) {
        return a.length() < b.length();
    });
    auto t4 = std::chrono::high_resolution_clock::now();
    std::cout << "Sorted by length in " << std::chrono::duration<double>(t4 - t3).count() << " seconds" << std::endl;

    // Filter out redundant keywords
    auto t5 = std::chrono::high_resolution_clock::now();
    std::vector<std::string> unique_keywords;
    for (const auto& keyword : sorted_keywords) {

        for (const auto& unique_keyword : unique_keywords) {
            // Check if the current keyword includes any of the unique keywords
            if (keyword.find(unique_keyword) != std::string::npos) {
                goto next_iteration;
            }
        }

        // If the keyword does not include any of the shorter keywords, add it to unique_keywords
        unique_keywords.push_back(keyword);
        next_iteration:
            continue;
    }
    auto t6 = std::chrono::high_resolution_clock::now();
    std::cout << "Filtered in " << std::chrono::duration<double>(t6 - t5).count() << " seconds" << std::endl;

    auto t7 = std::chrono::high_resolution_clock::now();
    sorted_keywords.clear();
    sorted_keywords.shrink_to_fit();
    unique_keywords.shrink_to_fit();
    auto t8 = std::chrono::high_resolution_clock::now();
    std::cout << "Destructed in " << std::chrono::duration<double>(t8 - t7).count() << " seconds" << std::endl;

    auto t9 = std::chrono::high_resolution_clock::now();
    auto total_time = std::chrono::duration<double>(t9 - t_start).count();
    std::cout << "Total function time: " << total_time << " seconds" << std::endl;

    return unique_keywords;
}

void test_performance_intensive_case() {
    std::vector<std::string> keywords = {
        "abcdef", "def", "ghidef", "jkl", "mnop", "qrst", "uvwxyz",
        "ghijkl", "abcdefgh", "ijklmnop", "mnopqrstuv", "rstuvwx",
        "mnopqrstuvwx", "uvwx"
    };

    // Extending the list to make it very long and complex
    std::vector<std::string> base_keywords = keywords;
    for (int i = 0; i < 20000; ++i) {
        for (const auto& k : keywords) {
            base_keywords.push_back(k + std::to_string(i));
            base_keywords.push_back(std::to_string(i) + k);
            base_keywords.push_back(std::to_string(i) + k + std::to_string(i));
        }
    }

    auto keywords_extended = base_keywords;
    std::set<std::string> expected_result = {"def", "jkl", "mnop", "qrst", "uvwx"};

    // there is no looping to calculate an average because it is so much slower
    auto start_time_total = std::chrono::high_resolution_clock::now();
    auto start_time = std::chrono::high_resolution_clock::now();
    std::vector<std::string> result_old = deduplicate_keywords(keywords_extended);
    auto end_time = std::chrono::high_resolution_clock::now();
    double old_function_time = std::chrono::duration<double>(end_time - start_time).count();
    auto end_time_total = std::chrono::high_resolution_clock::now();
    double total_test_time = std::chrono::duration<double>(end_time_total - start_time_total).count();
    assert(std::set<std::string>(result_old.begin(), result_old.end()) == expected_result);

    std::cout << "old function time: " << old_function_time << " seconds" << std::endl;
    std::cout << "Total test case time: " << total_test_time << " seconds" << std::endl;

    auto start_time_total_new = std::chrono::high_resolution_clock::now();
    auto start_time_new = std::chrono::high_resolution_clock::now();
    std::vector<std::string>result_new = deduplicate_keywords_new(keywords_extended); // same as in the python code, useful for comparison
    auto end_time_new = std::chrono::high_resolution_clock::now();
    double new_function_time = std::chrono::duration<double>(end_time - start_time).count();
    auto end_time_total_new = std::chrono::high_resolution_clock::now();
    double total_test_time_new = std::chrono::duration<double>(end_time_total_new - start_time_total_new).count();
    assert(std::set<std::string>(result_new.begin(), result_new.end()) == expected_result);


    std::cout << "new function time: " << new_function_time << " seconds" << std::endl;
    std::cout << "Total test case time: " << total_test_time_new << " seconds" << std::endl;

    std::cout << "Test performance_intensive_case for old function took " << old_function_time
              << " seconds but new function took " << new_function_time << " seconds"  << std::endl;

    if (old_function_time < new_function_time) {
        std::cout << "Old is faster than new" << std::endl;
    } else {
        std::cout << "New is faster than old" << std::endl;
    }
}

int main() {
    test_performance_intensive_case();
    return 0;
}

除了总体速度降低之外，日志似乎表明返回函数（

started old function!
removed duplicates in 9.64173 seconds
Sorted by length in 3.36138 seconds
Filtered in 0.366933 seconds
Destructed in 7.63208 seconds
Total function time: 21.0037 seconds
old function time: 405.213 seconds
Total test case time: 405.213 seconds

）似乎需要非常长的时间。为什么是这样？我能做些什么来阻止这种情况呢？我对 cpp 编码和开发相当陌生，因此如果您的解释详细而透彻，那就太好了。谢谢！ return unique keywords ) seems to take extremely long. Why is this? And what can I do to stop this? I am fairly new to cpp coding and development, so it would be great if your explainations were detailed and thorough. Thanks!

遇到的 C++ 代码性能问题与返回语句本身无关，而是与向量去分配和复制的方式有关。问题出在 test_performance_intensive_case 函数中的这一行：

std::vector<std::string> result_old = deduplicate_keywords(keywords_extended);

这里发生的是， deduplicate_keywords 函数返回的 unique_keywords 向量 按值复制 到 result_old 向量。这意味着将复制整个向量内容（包括所有字符串），这是一项代价高昂的操作，尤其是在向量很大时。

C++ 代码的执行时间比 Python 代码长得多的原因是，Python 代码可能在内部优化了此复制操作，或者根本没有进行复制（例如，通过使用某种形式的写时复制）。

为了解决这个问题，可以通过引用返回向量：

const std::vector<std::string>& deduplicate_keywords(const std::vector<std::string>& keywords) {
    // ... [函数的其余部分保持不变] ...

    return unique_keywords;
}

并在 test_performance_intensive_case 函数中按引用接收它：

const std::vector<std::string>& result_old = deduplicate_keywords(keywords_extended);

通过按引用返回向量，避免了复制整个向量及其内容，这将显着提高代码的性能。

除了这个更改之外，以下是一些可以进一步优化 C++ 代码的额外提示：

使用 reserve() 预分配向量容量： 在 unique_keywords 向量中添加元素之前，使用 unique_keywords.reserve(sorted_keywords.size()) 预分配其容量。这将最大程度地减少向量在添加元素时需要调整大小的次数。
考虑使用 std::unordered_set 来代替 std::set ： 如果不需要对关键字进行排序，则可以使用 std::unordered_set 来实现更快的重复项删除。
使用分析器来识别瓶颈： 分析器可以帮助确定代码中占用最多时间的特定部分，这使可以专注于优化这些区域。

通过应用这些更改并进一步分析的代码，可以显着提高其性能并使其更快地执行。

标签：python,c++,algorithm,optimization,string-matching
From： 78812814

C++ 函数返回极其缓慢，远慢于功能等效的 python 代码

相关文章

赞助商

阅读排行