文章:<SPATIAL-MAMBA: EFFECTIVE VISUAL STATE SPACE MODELS VIA STRUCTURE-AWARE STATE FUSION>
使用的是Autodl的算法社区的Mamba2环境CodeWithGPU | 能复现才是好算法CodeWithGPU | GitHub AI算法复现社区,能复现才是好算法https://www.codewithgpu.com/i/state-spaces/mamba/mamba-ssm_causal-convld
开始搭建Spatial-Mamba
conda activate mamba2
git clone https://github.com/EdwardChasel/Spatial-Mamba.git
cd Spatial-Mamba
pip install --upgrade pip
pip install -r requirements.txt
cd kernels/selective_scan && pip install .
cd kernels/dwconv2d && python3 setup.py install --user
pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0
COCO数据库安装
cd
cd autodl-pub/COCO2017
unzip train2017.zip -d /root/autodl-tmp/coco/
unzip val2017.zip -d /root/autodl-tmp/coco/
unzip test2017.zip -d /root/autodl-tmp/coco/
unzip annotations_trainval2017.zip -d /root/autodl-tmp/coco/
ImageNet-1K数据库安装
cd
python autodl-pub/ImageNet/extract_imagenet.py
附:ImageNet的验证(val)集是分散的图片,而训练(train)集是分装成文件夹的图片夹
imagenet/
├── train/
│ ├── n01440764/ (Example synset ID)
│ │ ├── image1.JPEG
│ │ ├── image2.JPEG
│ │ └── ...
│ ├── n01443537/ (Another synset ID)
│ │ └── ...
│ └── ...
└── val/
└── image1.JPEG
└── image2.JPEG
└── image3.JPEG
└── image4.JPEG
└── ...
通常来说需要对val集进行分割成文件夹,这里使用大家常用的valprep。
cd
cd autodl-tmp/imagenet/val
wget https://github.com/soumith/imagenetloader.torch/blob/master/valprep.sh
bash valprep.sh
最后需要变成下面形式
imagenet/
├── train/
│ ├── n01440764/ (Example synset ID)
│ │ ├── image1.JPEG
│ │ ├── image2.JPEG
│ │ └── ...
│ ├── n01443537/ (Another synset ID)
│ │ └── ...
│ └── ...
└── val/
├── n01440764/ (Example synset ID)
│ ├── image1.JPEG
│ └── ...
└── ...
复现过程
以一个GPU运行训练
cd
cd Spatial-Mamba/classification
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg /root/Spatial-Mamba/classification/configs/spatialmamba/spatialmamba_tiny.yaml --batch-size 128 --data-path /root/autodl-tmp/imagenet --output /root/Spatial-Mamba/Out/
--nproc_per_node=1代表1个GPU,可以更改GPU使用数
使用预训练之后的模型测试集:
cd
cd Spatial-Mamba/ckpt
wget https://drive.google.com/file/d/19kXoqGSTuKKs4AHbdUSrdKZTwTWenLIW/view?usp=drive_link
cd
sed -i 's/selective_scan_cuda_oflex_rh/selective_scan_cuda/g' /root/Spatial-Mamba/classification/models/utils.py
cd
cd Spatial-Mamba/classification
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg /root/Spatial-Mamba/classification/configs/spatialmamba/spatialmamba_tiny.yaml --batch-size 128 --data-path /root/autodl-tmp/imagenet --output /root/Spatial-Mamba/Out/ --pretrained /root/Spatial-Mamba/ckpt/spatialmamba_tiny_224_1k.pth
如果wget下载不了,就本地下载再放入Spatial-Mamba/ckpt,上面第二行代码是修改/root/Spatial-Mamba/classification/models/utils.py已经失效的函数
sed -i 's/selective_scan_cuda_oflex_rh/selective_scan_cuda/g' /root/Spatial-Mamba/classification/models/utils.py
报错:
File "/root/miniconda3/envs/mamba2/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/mamba2/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/torch/distributed/launch.py", line 208, in <module>
main()
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/typing_extensions.py", line 2853, in wrapper
return arg(*args, **kwargs)
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/torch/distributed/launch.py", line 204, in main
launch(args)
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/torch/distributed/launch.py", line 189, in launch
run(args)
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/mamba2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
标签:Autoldl,--,py,cd,Spatial,Mamba,root
From: https://blog.csdn.net/qq_62111160/article/details/145005751