首页 > 其他分享 >图像二值化的指令集加速例子

图像二值化的指令集加速例子

时间:2023-03-22 17:55:28浏览次数:49  
标签:NONE AVX 指令集 OPENCV int64 ms 图像 SSE 二值化

以下代码基于VS2015、Qt5.9和OpenCV430,CPU型号是Intel Core i5-7400。功能是对图像进行二值化。下面直接上代码:

void main()
{
    Mat image(1024, 1024, CV_8UC1, Scalar(255));
    circle(image, Point2i(500, 500), 200, Scalar(0), -1);
    int64 t1, t2;
    Mat binar1(image.size(), image.type());
    Mat binar2(image.size(), image.type());
    // 确保32字节对齐
    ASSERT(int64(image.data) % 32 == 0);
    ASSERT(int64(binar1.data) % 32 == 0);
    ASSERT(int64(binar2.data) % 32 == 0);

    t1 = getTickCount();
    threshold(image, binar1, 127, 255, THRESH_BINARY);
    t2 = getTickCount();
    qDebug() << u8"OPENCV(ms):" << (t2 - t1) / getTickFrequency() * 1000;

    t1 = getTickCount();
    for (int i = 0; i < 1024; i++)
    {
        const uchar* line = image.ptr<uchar>(i);
        uchar* dest = binar2.ptr<uchar>(i);
        for (int j = 0; j < 1024; j++)
        {
            dest[j] = line[j] > 127 ? 255 : 0;
        }
    }
    t2 = getTickCount();
    qDebug() << u8"NONE(ms):" << (t2 - t1) / getTickFrequency() * 1000;

    t1 = getTickCount();
    __m128i m128t = _mm_set_epi16(127, 127, 127, 127, 127, 127, 127, 127);
    __m128i m128h = _mm_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, 14, 12, 10, 8, 6, 4, 2, 0);
    for (int i = 0; i < 1024; i++)
    {
        const uchar* line = image.ptr<uchar>(i);
        uchar* dest = binar2.ptr<uchar>(i);
        for (int j = 0; j < 1024; j += 8)
        {
            __m128i mmx08 = _mm_set_epi64x(0, *(int64*)&line[j]);
            __m128i mmx16 = _mm_cvtepu8_epi16(mmx08);
            __m128i res = _mm_cmplt_epi16(m128t, mmx16);
            __m128i half = _mm_shuffle_epi8(res, m128h);
            *(int64*)&dest[j] = _mm_extract_epi64(half, 0);
        }
    }
    t2 = getTickCount();
    qDebug() << u8"SSE(ms):" << (t2 - t1) / getTickFrequency() * 1000;

    t1 = getTickCount();
    __m256i m256t = _mm256_set1_epi16(127);
    __m256i m256h = _mm256_set_epi8(-1, -1, -1, -1, -1, -1, -1, -1, 14, 12, 10, 8, 6, 4, 2, 0,
        -1, -1, -1, -1, -1, -1, -1, -1, 14, 12, 10, 8, 6, 4, 2, 0);
    for (int i = 0; i < 1024; i++)
    {
        const uchar* line = image.ptr<uchar>(i);
        uchar* dest = binar2.ptr<uchar>(i);
        for (int j = 0; j < 1024; j += 16)
        {
            __m128i mmx08 = _mm_set_epi64x(*(int64*)&line[j + 8], *(int64*)&line[j]);
            __m256i mmx16 = _mm256_cvtepu8_epi16(mmx08);
            __m256i res = _mm256_cmpgt_epi16(mmx16, m256t);
            __m256i half = _mm256_shuffle_epi8(res, m256h);
            *(int64*)&dest[j] = _mm256_extract_epi64(half, 0);
            *(int64*)&dest[j + 8] = _mm256_extract_epi64(half, 2);
        }
    }
    t2 = getTickCount();
    qDebug() << u8"AVX(ms):" << (t2 - t1) / getTickFrequency() * 1000;
}

在Release版下执行50次的输出如下。从这一批次的输出可知AVX优化的运行效率大部分都能超过OpenCV的运行效率:

OPENCV(ms): 2.0732
NONE(ms): 0.7314
SSE(ms): 0.2543
AVX(ms): 0.2199
OPENCV(ms): 0.4455
NONE(ms): 0.7666
SSE(ms): 0.293
AVX(ms): 0.179
OPENCV(ms): 0.6254
NONE(ms): 0.8789
SSE(ms): 0.2223
AVX(ms): 0.1512
OPENCV(ms): 0.4486
NONE(ms): 0.7306
SSE(ms): 0.2154
AVX(ms): 0.175
OPENCV(ms): 0.5774
NONE(ms): 2.3402
SSE(ms): 0.2871
AVX(ms): 0.2766
OPENCV(ms): 0.3737
NONE(ms): 0.7787
SSE(ms): 0.3047
AVX(ms): 0.3284
OPENCV(ms): 0.3145
NONE(ms): 0.7349
SSE(ms): 0.3549
AVX(ms): 0.3025
OPENCV(ms): 0.4318
NONE(ms): 0.7679
SSE(ms): 2.4315
AVX(ms): 0.2681
OPENCV(ms): 0.3959
NONE(ms): 0.9343
SSE(ms): 0.3756
AVX(ms): 0.439
OPENCV(ms): 0.3512
NONE(ms): 2.4505
SSE(ms): 0.377
AVX(ms): 0.2237
OPENCV(ms): 0.5284
NONE(ms): 0.7935
SSE(ms): 0.4699
AVX(ms): 0.2633
OPENCV(ms): 0.4671
NONE(ms): 0.8124
SSE(ms): 0.2919
AVX(ms): 0.2929
OPENCV(ms): 0.5293
NONE(ms): 0.7665
SSE(ms): 0.3181
AVX(ms): 0.408
OPENCV(ms): 0.6264
NONE(ms): 0.8933
SSE(ms): 0.2657
AVX(ms): 0.3929
OPENCV(ms): 0.5343
NONE(ms): 0.8591
SSE(ms): 0.3004
AVX(ms): 0.8155
...<输出太多删除一部分>
OPENCV(ms): 0.3946
NONE(ms): 1.2074
SSE(ms): 0.3121
AVX(ms): 0.3349
OPENCV(ms): 0.6635
NONE(ms): 0.8499
SSE(ms): 0.2915
AVX(ms): 0.3152
OPENCV(ms): 0.6398
NONE(ms): 0.9685
SSE(ms): 0.3917
AVX(ms): 0.2999
OPENCV(ms): 0.3454
NONE(ms): 0.9082
SSE(ms): 0.3983
AVX(ms): 0.3385
OPENCV(ms): 0.3415
NONE(ms): 1.035
SSE(ms): 0.3842
AVX(ms): 0.2633
OPENCV(ms): 0.4105
NONE(ms): 1.1947
SSE(ms): 0.3958
AVX(ms): 0.3525
OPENCV(ms): 0.612
NONE(ms): 0.9998
SSE(ms): 0.3176
AVX(ms): 0.3837
OPENCV(ms): 0.4727
NONE(ms): 0.8645
SSE(ms): 0.2794
AVX(ms): 0.2068
OPENCV(ms): 0.6206
NONE(ms): 0.9266
SSE(ms): 0.3822
AVX(ms): 0.3107
OPENCV(ms): 0.6847
NONE(ms): 0.9386
SSE(ms): 0.3073
AVX(ms): 0.4238
OPENCV(ms): 0.4841
NONE(ms): 1.002
SSE(ms): 0.2424
AVX(ms): 0.2825
OPENCV(ms): 0.5021
NONE(ms): 1.2102
SSE(ms): 0.3045
AVX(ms): 0.2816
OPENCV(ms): 0.6298
NONE(ms): 1.6238
SSE(ms): 0.4122
AVX(ms): 0.2643
OPENCV(ms): 0.8655
NONE(ms): 1.0023
SSE(ms): 0.3301
AVX(ms): 0.3396
OPENCV(ms): 0.6918
NONE(ms): 0.8999
SSE(ms): 0.2622
AVX(ms): 0.1829

 

标签:NONE,AVX,指令集,OPENCV,int64,ms,图像,SSE,二值化
From: https://www.cnblogs.com/mengxiangdu/p/17244938.html

相关文章

  • SDL应用之YUV图像与音频输出
    1.YUV简介   YUV,分为三个分量,“Y”表示明亮度(Luminance或Luma),也就是灰度值;而“U”和“V”表示的则是色度(Chrominance或Chroma),作用是描述影像色彩及饱和度,用于指定像素......
  • 人工智能肺炎图像识别模型-企业模型_科研机构应用
    什么是肺炎?肺炎是一种肺部炎症,主要影响称为肺泡的小气囊。alveoliwithfluid有液体肺泡(病态)healthalveoli健康的肺泡blockageofthebronchiole细支气管阻塞normalbro......
  • 图像处理之sobel算子
    ​​​​1sobel算子的基本概念Sobel算子是一个主要用于边缘检测的离散微分算子(discretedifferentiationoperator)。它结合了高斯平滑和微分求导,用来计算图像灰度函数的近......
  • 图像的双边滤波matlab仿真
    1.算法描述图像去噪是用于解决图像由于噪声干扰而导致其质量下降的问题,通过去噪技术可以有效地提高图像质量,增大信噪比,更好的体现原来图像所携带的信息。在我们的图像中常......
  • m基于KSVD字典训练法的图像噪声滤波matlab仿真,对比图像中值滤波,ACWMF滤波,DWMR滤波以
    1.算法描述K-SVD算法是一种新型的字典训练法,其基本原理是基于K-SVD算法改进所得到的,其主要过程是字典的训练过程,其具有非常好的自适应性能。该算法的整体流程图如下图所示:......
  • m基于KSVD字典训练法的图像噪声滤波matlab仿真,对比图像中值滤波,ACWMF滤波,DWMR滤波以
    1.算法描述       K-SVD算法是一种新型的字典训练法,其基本原理是基于K-SVD算法改进所得到的,其主要过程是字典的训练过程,其具有非常好的自适应性能。该算法的整体流......
  • 图像处理之Canny边缘检测
    ​​​​1概述本节中,我们将一起学习OpenCV中边缘检测的各种算子和滤波器Canny算子、Sobel算子、Laplacian算子以及Scharr滤波器。2边缘检测的一般步骤在具体介绍之前,先来......
  • 【Android开发】高级组件-图像切换器
    图像切换器(ImageSwitcher),用于实现类似于Windows操作系统的“Windows照片查看器”中的上一张、下一张切换图片的功能。在使用ImageSwitcher时,必须实现V......
  • 基于SIFT特征提取的图像特征提取配准和拼接matlab仿真
    1.算法描述SIFT是一种从图像中提取独特不变特征的方法,其特点为基于图像的一些局部特征,而与图像整体的大小和旋转无关。并且该方法对于光照、噪声、仿射变换具有一定鲁棒性......
  • 图像处理之漫水填充
    ​​​​1概述本节我们将一起探讨OpenCV填充算法中漫水填充算法相关的知识点,并了解OpenCV中实现漫水填充算法的两个版本的floodFill函数的使用方法。漫水填充法是一种用......