背景知识
本节继续MediaPipe Objectron,官方介绍:
Objectron (3D Object Detection) - mediapipe
MediaPipe Objectron is a mobile real-time 3D object detection solution for everyday objects. It detects objects in 2D images, and estimates their poses through a machine learning (ML) model, trained on the Objectron dataset.
推荐去看下官方,下面文字是在线翻译的,不一定准确。
用于 3D 对象检测的 ML 管道
我们构建了两个 ML 管道来从单个 RGB 图像预测对象的 3D 边界框:一个是两级管道,另一个是单级管道。两级管道比具有相似或更好精度的单级管道快 3 倍。单级管道擅长检测多个对象,而两级管道适合单个对象。
两级管道
我们的两级管道由下图中的图表说明。第一阶段使用对象检测器来查找对象的 2D 裁剪。第二阶段进行图像裁剪并估计 3D 边界框。同时,它还为下一帧计算对象的 2D 裁剪,这样对象检测器就不需要每帧都运行。
单级管道
我们的单级流水线如图 6 中的图表所示,模型主干具有基于 MobileNetv2 构建的编码器-解码器架构。我们采用多任务学习方法,通过检测和回归共同预测物体的形状。形状任务根据可用的真值注释(例如分割)来预测对象的形状信号。如果训练数据中没有形状注释,则此选项是可选的。对于检测任务,我们使用带注释的边界框,并将高斯量拟合到盒子上,中心位于盒子质心处,标准差与盒子大小成正比。然后,检测的目标是预测这种分布,其峰值代表物体的中心位置。回归任务估计 8 个边界框顶点的 2D 投影。为了获得边界框的最终 3D 坐标,我们利用了一种成熟的姿态估计算法 (EPnP)。它可以恢复对象的 3D 边界框,而无需先验了解对象尺寸。给定 3D 边界框,我们可以轻松计算对象的姿势和大小。该模型足够轻,可以在移动设备上实时运行(在 Adreno 650 移动 GPU 上以 26 FPS 运行)。
(左)带有估计边界框的原始2D图像,(中)通过高斯分布检测目标,(右)预测的分割掩码
理论部分结束了,其实我也没看懂,更多是调用api.
看下关键api
mp_objectron.Objectron(static_image_mode=False,
max_num_objects=5,
min_detection_confidence=0.5,
min_tracking_confidence=0.7,
model_name='Cup')
tatic_image_mode: 基于我们将使用图像或视频进行3D检测(对于图像为True,对于视频为False)
max_num_objects: 定义我们想要在其周围绘制边界框的最大可识别对象数。
min_detection_confidence: 检测给定类别所需的阈值,默认为 0.5。
min_tracking_confidence: 在跟踪物体时避免误报的阈值,默认为0.99
model_name: 定义我们将在3D目标检测模型中使用哪个类别,可是’Cup’、‘Shoe’、‘Camera’或’Chair’。
还有3个不常用的:
FOCAL_LENGTH:默认情况下,相机焦距在 NDC 空间中定义,即 (fx, fy)。默认为 (1.0, 1.0)。要在像素空间中指定焦距,即 (fx_pixel, fy_pixel),用户应提供 image_size = (image_width, image_height) 以启用 API 内部的转换。
PRINCIPAL_POINT:默认情况下,相机主点定义在 NDC 空间中,即 (px, py)。默认为 (0.0, 0.0)。要指定像素空间中的主要点,即(px_pixel, py_pixel),用户应提供 image_size = (image_width, image_height) 以启用 API 内部的转换。
IMAGE_SIZE:仅当在像素空间中指定了 focus_length 和 principal_point 时才指定。 输入图像的大小,即 (image_width, image_height)。
测试代码:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils #绘图
mp_drawing_styles = mp.solutions.drawing_styles
mp_objectron = mp.solutions.objectron
#模型参数
objectron = mp_objectron.Objectron(static_image_mode=False,
max_num_objects=5,
min_detection_confidence=0.5,
min_tracking_confidence=0.7,
model_name='Cup')
cap = cv2.VideoCapture(0)#打开默认摄像头
while True:
ret,frame = cap.read()#读取一帧图像
#图像格式转换
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# 因为摄像头是镜像的,所以将摄像头水平翻转
# 不是镜像的可以不翻转
frame= cv2.flip(frame,1)
#输出结果
results = objectron.process(frame)
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
if results.detected_objects:
print(f'landmarks:{results.detected_objects}' )
for detected_object in results.detected_objects:
# 关键点可视化
mp_drawing.draw_landmarks(
frame, detected_object.landmarks_2d, mp_objectron.BOX_CONNECTIONS)
mp_drawing.draw_axis(frame, detected_object.rotation,detected_object.translation)
cv2.imshow('MediaPipe Objectron', frame)
if cv2.waitKey(1) & 0xFF == 27:
break
cap.release()
一些日志
landmark {
x: 0.245591819
y: 0.0559553206
z: -0.411984473
}
, rotation=array([[ 0.99227989, -0.00102335, 0.34644848],
[ 0.074286 , 0.99326283, -0.33394322],
[-0.09930787, 0.11587758, 0.87661618]]), translation=array([ 0.05228404, -0.13519081, -0.48538274]), scale=array([0.33719146, 0.41056389, 0.15138569]))]
landmark {
x: 0.245591819
y: 0.0559553206
z: -0.411984473
}
, rotation=array([[ 0.99227989, -0.00102335, 0.34644848],
[ 0.074286 , 0.99326283, -0.33394322],
[-0.09930787, 0.11587758, 0.87661618]]), translation=array([ 0.05228404, -0.13519081, -0.48538274]), scale=array([0.33719146, 0.41056389, 0.15138569]))]
2D目标检测:
- 边界框中心的X坐标
- 边界框中心的Y坐标
3D目标检测:
- 边界框中心的X坐标
- 边界框中心的Y坐标
- 边界框中心的Z坐标
- Roll角度表示绕X轴的旋转
- Pitch角度表示绕Y轴的旋转
- Yaw角度表示绕Z轴的旋转
效果
三维物体检测
src/yahboom_esp32_mediapipe/yahboom_esp32_mediapipe/目录下新建文件08_Objectron.py,代码如下
#!/usr/bin/env python3
# encoding: utf-8
import mediapipe as mp
import cv2 as cv
import time
import rclpy
from rclpy.node import Node
from cv_bridge import CvBridge
from sensor_msgs.msg import Image, CompressedImage
from rclpy.time import Time
import datetime
class Objectron:
def __init__(self, staticMode=False, maxObjects=5, minDetectionCon=0.5, minTrackingCon=0.99):
self.staticMode=staticMode
self.maxObjects=maxObjects
self.minDetectionCon=minDetectionCon
self.minTrackingCon=minTrackingCon
self.index=3
self.modelNames = ['Shoe', 'Chair', 'Cup', 'Camera']
self.mpObjectron = mp.solutions.objectron
self.mpDraw = mp.solutions.drawing_utils#画图
#模型初始化
self.mpobjectron = self.mpObjectron.Objectron(
self.staticMode, self.maxObjects, self.minDetectionCon, self.minTrackingCon, self.modelNames[self.index])
def findObjectron(self, frame):
cv.putText(frame, self.modelNames[self.index], (int(frame.shape[1] / 2) - 30, 30),
cv.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 3)
img_RGB = cv.cvtColor(frame, cv.COLOR_BGR2RGB)#图像格式转换
results = self.mpobjectron.process(img_RGB)#检测
if results.detected_objects:
for id, detection in enumerate(results.detected_objects):#输出结果
self.mpDraw.draw_landmarks(frame, detection.landmarks_2d, self.mpObjectron.BOX_CONNECTIONS)
self.mpDraw.draw_axis(frame, detection.rotation, detection.translation)
return frame
#切换模型识别类别
def configUP(self):
self.index += 1
if self.index>=4:self.index=0
self.mpobjectron = self.mpObjectron.Objectron(
self.staticMode, self.maxObjects, self.minDetectionCon, self.minTrackingCon, self.modelNames[self.index])
print(f'change model name:{self.modelNames[self.index]}')
class MY_Picture(Node):
def __init__(self, name):
super().__init__(name)
self.bridge = CvBridge()
self.sub_img = self.create_subscription(
CompressedImage, '/espRos/esp32camera', self.handleTopic, 1) #获取esp32传来的图像
self.objectron = Objectron()
self.last_stamp = None
self.new_seconds = 0
self.fps_seconds = 1
def handleTopic(self, msg):
self.last_stamp = msg.header.stamp
if self.last_stamp:
total_secs = Time(nanoseconds=self.last_stamp.nanosec, seconds=self.last_stamp.sec).nanoseconds
delta = datetime.timedelta(seconds=total_secs * 1e-9)
seconds = delta.total_seconds()*100
if self.new_seconds != 0:
self.fps_seconds = seconds - self.new_seconds
self.new_seconds = seconds#保留这次的值
start = time.time()
frame = self.bridge.compressed_imgmsg_to_cv2(msg)
frame = cv.resize(frame, (640, 480))
action = cv.waitKey(1) & 0xFF
if action == ord('f') or action == ord('F') : self.objectron.configUP()
frame = self.objectron.findObjectron(frame)
end = time.time()
fps = 1 / ((end - start)+self.fps_seconds)
text = "FPS : " + str(int(fps))
cv.putText(frame, text, (20, 30), cv.FONT_HERSHEY_SIMPLEX, 0.9, (0, 0, 255), 1)
cv.imshow('frame', frame)
def main():
print("start it")
rclpy.init()
esp_img = MY_Picture("My_Picture")
try:
rclpy.spin(esp_img)
except KeyboardInterrupt:
pass
finally:
esp_img.destroy_node()
rclpy.shutdown()
构建后运行:
bohu@bohu-TM1701:~/yahboomcar/yahboomcar_ws$ ros2 run yahboom_esp32_mediapipe Objectron
start it
Downloading model to /home/bohu/.local/lib/python3.10/site-packages/mediapipe/modules/objectron/object_detection_3d_camera.tflite
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1737619761.397657 95507 gl_context_egl.cc:85] Successfully initialized EGL. Major : 1 Minor: 5
I0000 00:00:1737619761.403901 95558 gl_context.cc:369] GL version: 3.2 (OpenGL ES 3.2 Mesa 23.2.1-1ubuntu3.1~22.04.3), renderer: Mesa Intel(R) UHD Graphics 620 (KBL GT2)
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
W0000 00:00:1737619761.554549 95546 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
W0000 00:00:1737619761.598330 95548 inference_feedback_manager.cc:114] Feedback manager requires a model with a single signature inference. Disabling support for feedback tensors.
Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway.
W0000 00:00:1737620056.092019 95547 landmark_projection_calculator.cc:186] Using NORM_RECT without IMAGE_DIMENSIONS is only supported for the square ROI. Provide IMAGE_DIMENSIONS or use PROJECTION_MATRIX.
首次运行会下载模型
标签:objectron,seconds,self,Objectron,microros,mp,亚博,ubuntu,frame From: https://blog.csdn.net/bohu83/article/details/145319843