如何为 NYU 数据集训练 Yolo 3D

标签：python machine-learning computer-vision yolov5

我已经在 KITTI 数据集上训练了我的 Yolo 3D 模型，现在我想在 NYU 数据集上训练它。为了在 YOLO 3D 模型中训练它，我必须对 NYU 数据集进行哪些更改？

我想知道 YOLO 3D 接受的数据集格式。

（编辑）我使用的模型是 YOLO 3D-lightning https://github.com/ruhyadi/yolo3d-lightning

基本上我使用预训练的权重进行推理

python inference.py \
  source_dir="./data/demo/videos/2011_09_26/image_02/data" \
  detector.model_path="./weights/detector_yolov5s.pt" \
  regressor_weights="./weights/mobilenetv3-best.pt"

实际上这个模型是在 KITI 数据集中训练的但现在我想在 NYU 数据集中训练这个模型，这就是为什么我想知道我需要更改 NYU 数据集的格式，以便我可以使用它来训练 YOLO 3D。

示例以便转换kiti 到 yolo 这个脚本给出了

"""
Convert KITTI format to YOLO format.
"""

import os
import numpy as np
from glob import glob
from tqdm import tqdm
import argparse

from typing import Tuple


class KITTI2YOLO:
    def __init__(
        self,
        dataset_path: str = "../data/KITTI",
        classes: Tuple = ["car", "van", "truck", "pedestrian", "cyclist"],
        img_width: int = 1224,
        img_height: int = 370,
    ):

        self.dataset_path = dataset_path
        self.img_width = img_width
        self.img_height = img_height
        self.classes = classes
        self.ids = {self.classes[i]: i for i in range(len(self.classes))}

        # create new directory
        self.label_path = os.path.join(self.dataset_path, "labels")
        if not os.path.isdir(self.label_path):
            os.makedirs(self.label_path)
        else:
            print("[INFO] Directory already exist...")

    def convert(self):
        files = glob(os.path.join(self.dataset_path, "label_2", "*.txt"))
        for file in tqdm(files):
            with open(file, "r") as f:
                filename = os.path.join(self.label_path, file.split("/")[-1])
                dump_txt = open(filename, "w")
                for line in f:
                    parse_line = self.parse_line(line)
                    if parse_line["name"].lower() not in self.classes:
                        continue

                    xmin, ymin, xmax, ymax = parse_line["bbox_camera"]
                    xcenter = ((xmax - xmin) / 2 + xmin) / self.img_width
                    if xcenter > 1.0:
                        xcenter = 1.0
                    ycenter = ((ymax - ymin) / 2 + ymin) / self.img_height
                    if ycenter > 1.0:
                        ycenter = 1.0
                    width = (xmax - xmin) / self.img_width
                    if width > 1.0:
                        width = 1.0
                    height = (ymax - ymin) / self.img_height
                    if height > 1.0:
                        height = 1.0

                    bbox_yolo = f"{self.ids[parse_line['name'].lower()]} {xcenter:.3f} {ycenter:.3f} {width:.3f} {height:.3f}"
                    dump_txt.write(bbox_yolo + "\n")

                dump_txt.close()

    def parse_line(self, line):
        parts = line.split(" ")
        output = {
            "name": parts[0].strip(),
            "xyz_camera": (float(parts[11]), float(parts[12]), float(parts[13])),
            "wlh": (float(parts[9]), float(parts[10]), float(parts[8])),
            "yaw_camera": float(parts[14]),
            "bbox_camera": (
                float(parts[4]),
                float(parts[5]),
                float(parts[6]),
                float(parts[7]),
            ),
            "truncation": float(parts[1]),
            "occlusion": float(parts[2]),
            "alpha": float(parts[3]),
        }

        # Add score if specified
        if len(parts) > 15:
            output["score"] = float(parts[15])
        else:
            output["score"] = np.nan

        return output


if __name__ == "__main__":

    # argparser
    parser = argparse.ArgumentParser(description="KITTI to YOLO Convertion")
    parser.add_argument("--dataset_path", type=str, default="../data/KITTI")
    parser.add_argument(
        "--classes",
        type=Tuple,
        default=["car", "van", "truck", "pedestrian", "cyclist"],
    )
    parser.add_argument("--img_width", type=int, default=1224)
    parser.add_argument("--img_height", type=int, default=370)
    args = parser.parse_args()

    kitit2yolo = KITTI2YOLO(
        dataset_path=args.dataset_path,
        classes=args.classes,
        img_width=args.img_width,
        img_height=args.img_height,
    )
    kitit2yolo.convert()

类似地我想得到一个将 NYU 转换为 yolo 的脚本

好的，了解想将 NYU 数据集格式转换为 YOLO 3D 训练所需的格式，尤其是使用的是 yolo3d-lightning 实现。

以下是需要了解的有关 NYU 数据集和 YOLO 格式的信息：

NYU 数据集格式： NYU 数据集通常提供 RGB 图像和深度图，以及相机姿态信息。深度信息以每个像素的距离（以米为单位）存储。标注可能因特定下载的数据集版本而异，但通常包括边界框和类别信息。
YOLO 3D 格式： yolo3d-lightning 模型期望数据采用类似 KITTI 的格式，每个图像有一个文本文件。每个文本文件包含每行一个对象的以下信息：
- 类别 ID： 类的数值索引（例如，0 代表汽车，1 代表行人）。
- 中心 x： 归一化边界框中心 x 坐标（0 到 1）。
- 中心 y： 归一化边界框中心 y 坐标（0 到 1）。
- 宽度： 归一化边界框宽度（0 到 1）。
- 高度： 归一化边界框高度（0 到 1）。
- 深度： 到对象的距离（以米为单位）。
- 航向角： 围绕垂直轴的对象方向（以弧度为单位）。

将 NYU 转换为 YOLO 格式的步骤：

准备数据：
- 创建一个名为“图像”的文件夹来存储图像，以及一个名为“标签”的文件夹来存储对应的标签文本文件。
- 确保拥有 RGB 图像、深度图、相机校准信息和标注。
生成标签文件： 对于每个图像，需要创建一个具有相同名称的文本文件（例如，00001.jpg 对应于 00001.txt）。
解析标注：
- 从 NYU 标注文件中读取边界框、类别和深度信息。
- 计算边界框中心、宽度和高度。
- 从深度图中提取边界框中心的深度值。
- 根据的 NYU 数据集的可用信息计算或获取航向角（如果需要）。
将边界框坐标标准化：
- 将边界框中心 x、中心 y、宽度和高度除以图像的相应宽度和高度。
将数据写入标签文件：
- 对于图像中的每个对象，将类别 ID、归一化的边界框中心 x、中心 y、宽度、高度、深度和航向角（如果可用）写入相应的文本文件。

Python 转换脚本示例：

import os
import numpy as np
from math import cos, sin

# 设置 NYU 数据集和输出目录的路径
nyu_dataset_path = '/path/to/nyu/dataset'
output_path = '/path/to/output'

# 创建输出目录（如果不存在）
os.makedirs(os.path.join(output_path, 'labels'), exist_ok=True)

# 设置图像宽度和高度
image_width = 640  # 根据的数据调整
image_height = 480

# 定义类名和 ID 的映射
class_names = {
    0: 'person',
    1: 'chair',
    # ... 添加的其他类
}

# 循环遍历图像
for image_name in os.listdir(os.path.join(nyu_dataset_path, 'images')):
    if not image_name.endswith('.jpg'):
        continue

    # 获取图像 ID
    image_id = image_name.split('.')[0]

    # 加载深度图
    depth_map = np.load(os.path.join(nyu_dataset_path, 'depth', f'{image_id}.npy'))

    # 加载标注（根据的标注格式调整）
    with open(os.path.join(nyu_dataset_path, 'labels', f'{image_id}.txt')) as f:
        labels = [line.strip().split() for line in f]

    # 创建标签文件
    label_file_path = os.path.join(output_path, 'labels', f'{image_id}.txt')
    with open(label_file_path, 'w') as label_file:
        # 处理每个对象
        for label in labels:
            class_id = int(label[0])
            xmin = float(label[1])
            ymin = float(label[2])
            xmax = float(label[3])
            ymax = float(label[4])

            # 计算边界框中心、宽度和高度
            bbox_center_x = ((xmax - xmin) / 2 + xmin) / image_width
            bbox_center_y = ((ymax - ymin) / 2 + ymin) / image_height
            bbox_width = (xmax - xmin) / image_width
            bbox_height = (ymax - ymin) / image_height

            # 获取边界框中心的深度
            depth = depth_map[int(bbox_center_y * image_height), int(bbox_center_x * image_width)]

            # 计算航向角（如果需要，请根据的数据调整）
            # yaw = ...

            # 将数据写入标签文件
            label_file.write(f'{class_id} {bbox_center_x:.6f} {bbox_center_y:.6f} {bbox_width:.6f} {bbox_height:.6f} {depth:.6f} {yaw:.6f}\n')

修改脚本： * 用的 NYU 数据集的实际路径替换 nyu_dataset_path 和 output_path 。 * 根据的数据格式调整 image_width 和 image_height 。 * 修改 class_names 以匹配的数据集中的类。 * 根据 NYU 标注文件结构调整加载标注的部分。 * 如果需要，请包含航向角计算。

此脚本将读取的 NYU 数据集并为 yolo3d-lightning 训练生成 YOLO 格式的标签文件。

请记住： 这个脚本提供了一个基本框架，可能需要根据的 NYU 数据集的特定格式和 yolo3d-lightning 模型的要求对其进行调整。

标签：python,machine-learning,computer-vision,yolov5
From： 78779752

如何为 NYU 数据集训练 Yolo 3D

相关文章

赞助商

阅读排行