c++ 代码技巧

标签：10 return 技巧 ++ BM 代码 long c++ result

数学运算性能

大多数数据运算不存在性能问题，但是相对来说，整型的除法运算还是比较昂贵的。
参考下面的例子：

uint32_t BM_S1(uint64_t v) {
  uint32_t result = 0;
  do {
    ++result;
    v /= 10;
  } while (v);
  return result;
}

uint32_t BM_S2(uint64_t v) {
  uint32_t result = 1;
  for (;;) {
    if (v < 10) return result;
    if (v < 100) return result + 1;
    if (v < 1000) return result + 2;
    if (v < 10000) return result + 3;
    v /= 10000;
    result += 4;
  }
  return result;
}

uint32_t BM_S3(uint64_t v) {
  uint32_t result = 1;
  for (;;) {
    if (v < 10) return 1;
    if (v < 100) return 2;
    if (v < 1000) return 3;
    if (v < 1000000000000) {  // 10^12
      if (v < 100000000) {    // 10^7
        if (v < 1000000) {    // 10^6
          if (v < 10000) {
            return 4;
          }
          return 5 + (v >= 100000);  // 10^5
        }
        return 7 + (v >= 10000000);  // 10^7
      }
      if (v < 10000000000) {           // 10^10
        return 9 + (v >= 1000000000);  // 10^9
      }
      return 11 + (v >= 100000000000);  // 10^11
    }
    v /= 1000000000000;
    result += 12;
  }
  return result;
}

核心原因是因为 BM_S2 用比较和加法减少了除法运算，BM_S3通过搜索的方法进一步减少了除法操作。
运算速度可以参考下面：

* comparisons  (1 clock cycle)
* (u)int add, subtract, bitops, shift (1 clock cycle)
* floating point add, sub (3~6 clock cycle) 
* indexed array access (cache effects)
* (u)int32 mul  (3~4 clock cycle)
* Floating point mul (4~8 clock cycle)
* Float Point division, remainder (14~45 clock cycle)
* (u)int division, remainder (40~80 clock cycle)

除法比乘法要慢很多，所以有些除法运算可以利用乘法来加快：

double y, a1, a2, b1, b2;
y = a1/b1 + a2/b2;  // slow
double y, a1, a2, b1, b2;
y = (a1*b2 + a2*b1) / (b1*b2); // faster

cache line

目前的计算机系统，cache line 是 64 字节，基于cache line 可以做一些优化，从而减少内存读取次数。但是有些场景，cache line 反而会拖慢，例如下面的结构体:

struct S {
  long long a;
  long long b;
};

如果两个线程，一个写a，一个写b，这样会频发的触发内存和cache的交互。通过添加占位，让a和b不在一个cache line上，性能会更好。

struct S2 {
  long long a;
  long long nonp[8];  // 占位，a、b 加载到不同的缓存行
  long long b;
};

选择删除vector中的某些元素

auto cond = [](int x) { return (x & 1 == 1); };
vec3.erase(std::remove_if(vec3.begin(), vec3.end(), cond), vec3.end());

时间复杂度是 O(n)

bench mark

之前一直用 https://quick-bench.com/ 做小段代码性能测试，但是最近这个网站挂了，还是在本地跑 benchmark 靠谱。

安装bench mark

git clone https://github.com/google/benchmark.git
git clone https://github.com/google/googletest.git benchmark/googletest
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE ../benchmark
make -j4
# 如果想全局安装就接着运行下面的命令
sudo make install

demo

测试代码

#include <chrono>
#include <iostream>
#include <thread>

#include <benchmark/benchmark.h>

struct S {
  long long a;
  long long b;
};

struct S2 {
  long long a;
  long long nonp[8];  // 占位，a、b 加载到不同的缓存行
  long long b;
};

void func1() {
  S s;
  std::thread t1([&]() {
    for (int i = 0; i < 100000; i++) {
      s.a++;
    }
  });
  std::thread t2([&]() {
    for (int i = 0; i < 100000; i++) {
      s.b++;
    }
  });
  t1.join();
  t2.join();
}

void func2() {
  S2 s;
  std::thread t1([&]() {
    for (int i = 0; i < 100000; i++) {
      s.a++;
    }
  });
  std::thread t2([&]() {
    for (int i = 0; i < 100000; i++) {
      s.b++;
    }
  });
  t1.join();
  t2.join();
}

static void BM_S(benchmark::State& state) {
  for (auto _ : state) {
    func1();
  }
}
BENCHMARK(BM_S);

static void BM_S2(benchmark::State& state) {
  for (auto _ : state) {
    func2();
  }
}
BENCHMARK(BM_S2);

BENCHMARK_MAIN();

编译（mac os)

clang++ -Wall -std=c++14 cache_line.cpp -I /usr/local/include -L /usr/local/lib -L /usr/lib -pthread -lbenchmark -o test

执行结果

标签：10,return,技巧,++,BM,代码,long,c++,result
From： https://www.cnblogs.com/xianzhedeyu/p/17183976.html