首页 > 其他分享 >convnext_xxlarge.clip_laion2b_soup_ft_in12k timm模型库

convnext_xxlarge.clip_laion2b_soup_ft_in12k timm模型库

时间:2024-09-12 10:50:22浏览次数:10  
标签:xxlarge ft in1k fcmae 模型库 in22k model data

Model card for convnext_xxlarge.clip_laion2b_soup_ft_in12k

A ConvNeXt image classification model. CLIP image tower weights pretrained in OpenCLIP on LAION and fine-tuned on ImageNet-12k by Ross Wightman.

Please see related OpenCLIP model cards for more details on pretrain:

Model Details

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('convnext_xxlarge.clip_laion2b_soup_ft_in12k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xxlarge.clip_laion2b_soup_ft_in12k',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 384, 64, 64])
    #  torch.Size([1, 768, 32, 32])
    #  torch.Size([1, 1536, 16, 16])
    #  torch.Size([1, 3072, 8, 8])

    print(o.shape)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'convnext_xxlarge.clip_laion2b_soup_ft_in12k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 3072, 8, 8) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

Model Comparison

Explore the dataset and runtime metrics of this model in timm model results.

All timing numbers from eager model PyTorch 1.13 on RTX 3090 w/ AMP.

modeltop1top5img_sizeparam_countgmacsmactssamples_per_secbatch_size
convnextv2_huge.fcmae_ft_in22k_in1k_51288.84898.742512660.29600.81413.0728.5848
convnextv2_huge.fcmae_ft_in22k_in1k_38488.66898.738384660.29337.96232.3550.5664
convnext_xxlarge.clip_laion2b_soup_ft_in1k88.61298.704256846.47198.09124.45122.45256
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_38488.31298.578384200.13101.11126.74196.84256
convnextv2_large.fcmae_ft_in22k_in1k_38488.19698.532384197.96101.1126.74128.94128
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_32087.96898.47320200.1370.2188.02283.42256
convnext_xlarge.fb_in22k_ft_in1k_38487.7598.556384350.2179.2168.99124.85192
convnextv2_base.fcmae_ft_in22k_in1k_38487.64698.42238488.7245.2184.49209.51256
convnext_large.fb_in22k_ft_in1k_38487.47698.382384197.77101.1126.74194.66256
convnext_large_mlp.clip_laion2b_augreg_ft_in1k87.34498.218256200.1344.9456.33438.08256
convnextv2_large.fcmae_ft_in22k_in1k87.2698.248224197.9634.443.13376.84256
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_38487.13898.21238488.5945.2184.49365.47256
convnext_xlarge.fb_in22k_ft_in1k87.00298.208224350.260.9857.5368.01256
convnext_base.fb_in22k_ft_in1k_38486.79698.26438488.5945.2184.49366.54256
convnextv2_base.fcmae_ft_in22k_in1k86.7498.02222488.7215.3828.75624.23256
convnext_large.fb_in22k_ft_in1k86.63698.028224197.7734.443.13581.43256
convnext_base.clip_laiona_augreg_ft_in1k_38486.50497.9738488.5945.2184.49368.14256
convnext_base.clip_laion2b_augreg_ft_in12k_in1k86.34497.9725688.5920.0937.55816.14256
convnextv2_huge.fcmae_ft_in1k86.25697.75224660.29115.079.07154.72256
convnext_small.in12k_ft_in1k_38486.18297.9238450.2225.5863.37516.19256
convnext_base.clip_laion2b_augreg_ft_in1k86.15497.6825688.5920.0937.55819.86256
convnext_base.fb_in22k_ft_in1k85.82297.86622488.5915.3828.751037.66256
convnext_small.fb_in22k_ft_in1k_38485.77897.88638450.2225.5863.37518.95256
convnextv2_large.fcmae_ft_in1k85.74297.584224197.9634.443.13375.23256
convnext_small.in12k_ft_in1k85.17497.50622450.228.7121.561474.31256
convnext_tiny.in12k_ft_in1k_38485.11897.60838428.5913.1439.48856.76256
convnextv2_tiny.fcmae_ft_in22k_in1k_38485.11297.6338428.6413.1439.48491.32256
convnextv2_base.fcmae_ft_in1k84.87497.0922488.7215.3828.75625.33256
convnext_small.fb_in22k_ft_in1k84.56297.39422450.228.7121.561478.29256
convnext_large.fb_in1k84.28296.892224197.7734.443.13584.28256
convnext_tiny.in12k_ft_in1k84.18697.12422428.594.4713.442433.7256
convnext_tiny.fb_in22k_ft_in1k_38484.08497.1438428.5913.1439.48862.95256
convnextv2_tiny.fcmae_ft_in22k_in1k83.89496.96422428.644.4713.441452.72256
convnext_base.fb_in1k83.8296.74622488.5915.3828.751054.0256
convnextv2_nano.fcmae_ft_in22k_in1k_38483.3796.74238415.627.2224.61801.72256
convnext_small.fb_in1k83.14296.43422450.228.7121.561464.0256
convnextv2_tiny.fcmae_ft_in1k82.9296.28422428.644.4713.441425.62256
convnext_tiny.fb_in22k_ft_in1k82.89896.61622428.594.4713.442480.88256
convnext_nano.in12k_ft_in1k82.28296.34422415.592.468.373926.52256
convnext_tiny_hnf.a2h_in1k82.21695.85222428.594.4713.442529.75256
convnext_tiny.fb_in1k82.06695.85422428.594.4713.442346.26256
convnextv2_nano.fcmae_ft_in22k_in1k82.0396.16622415.622.468.372300.18256
convnextv2_nano.fcmae_ft_in1k81.8395.73822415.622.468.372321.48256
convnext_nano_ols.d1h_in1k80.86695.24622415.652.659.383523.85256
convnext_nano.d1h_in1k80.76895.33422415.592.468.373915.58256
convnextv2_pico.fcmae_ft_in1k80.30495.0722249.071.376.13274.57256
convnext_pico.d1_in1k79.52694.5582249.051.376.15686.88256
convnext_pico_ols.d1_in1k79.52294.6922249.061.436.55422.46256
convnextv2_femto.fcmae_ft_in1k78.48893.982245.230.794.574264.2256
convnext_femto_ols.d1_in1k77.8693.832245.230.824.876910.6256
convnext_femto.d1_in1k77.45493.682245.220.794.577189.92256
convnextv2_atto.fcmae_ft_in1k76.66493.0442243.710.553.814728.91256
convnext_atto_ols.a2_in1k75.8892.8462243.70.584.117963.16256
convnext_atto.d2_in1k75.66492.92243.70.553.818439.22256

Citation

@software{ilharco_gabriel_2021_5143773,
  author       = {Ilharco, Gabriel and
                  Wortsman, Mitchell and
                  Wightman, Ross and
                  Gordon, Cade and
                  Carlini, Nicholas and
                  Taori, Rohan and
                  Dave, Achal and
                  Shankar, Vaishaal and
                  Namkoong, Hongseok and
                  Miller, John and
                  Hajishirzi, Hannaneh and
                  Farhadi, Ali and
                  Schmidt, Ludwig},
  title        = {OpenCLIP},
  month        = jul,
  year         = 2021,
  note         = {If you use this software, please cite it as below.},
  publisher    = {Zenodo},
  version      = {0.1},
  doi          = {10.5281/zenodo.5143773},
  url          = {https://doi.org/10.5281/zenodo.5143773}
}

@inproceedings{schuhmann2022laionb,
  title={{LAION}-5B: An open large-scale dataset for training next generation image-text models},
  author={Christoph Schuhmann and
          Romain Beaumont and
          Richard Vencu and
          Cade W Gordon and
          Ross Wightman and
          Mehdi Cherti and
          Theo Coombes and
          Aarush Katta and
          Clayton Mullis and
          Mitchell Wortsman and
          Patrick Schramowski and
          Srivatsa R Kundurthy and
          Katherine Crowson and
          Ludwig Schmidt and
          Robert Kaczmarczyk and
          Jenia Jitsev},
  booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2022},
  url={https://openreview.net/forum?id=M3Y74vmsMcY}
}

i

@inproceedings{Radford2021LearningTV,
  title={Learning Transferable Visual Models From Natural Language Supervision},
  author={Alec Radford and Jong Wook Kim and Chris Hallacy and A. Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever},
  booktitle={ICML}, 
  year={2021}
}

@article{liu2022convnet,
  author  = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  title   = {A ConvNet for the 2020s},
  journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year    = {2022},
}

标签:xxlarge,ft,in1k,fcmae,模型库,in22k,model,data
From: https://blog.csdn.net/sinat_37574187/article/details/142135489

相关文章

  • Ftrans跨域文件传输方案,数据流动无阻的高效路径
    大型集团企业由于其规模庞大、业务广泛且往往将分支机构、办事处分布在多个地域,因此会涉及到跨域文件传输的需求。主要源于以下几个方面:1.业务协同:集团内部的不同部门或子公司可能位于不同的地理位置,但需要进行紧密的业务协同。文件传输是实现这种协同的重要方式之一,包括项目......
  • springboot+vue新疆IP形象NFT藏品网站【程序+论文+开题】计算机毕业设计
    系统程序文件列表开题报告内容研究背景随着区块链技术的迅猛发展与数字经济的崛起,非同质化代币(NFT)作为一种全新的数字资产形式,正逐步改变着艺术品、收藏品乃至文化产业的传统格局。新疆,作为中国多元文化的瑰宝之地,其丰富的民族文化、自然风光及历史故事为NFT创作提供了无尽......
  • [AGC002F] Leftmost Ball
    题意给定\(n\)种颜色的球,每一种有\(k\)个,随意排列\(n\timesk\)个球,然后将每种球的左边第一个球变为第\(n+1\)种颜色,问操作过后有多少不同的颜色序列。\(n,k\le2000\)。Sol先将修改的球当成一种新的颜色。注意到一个性质,假设最终颜色序列一个前缀的第\(i\)个......
  • Microsoft Activation Scripts
    Open-sourceWindowsandOfficeactivatorfeaturingHWID,Ohook,KMS38,andOnlineKMSactivationmethods,alongwithadvancedtroubleshooting.Method1-PowerShell(Windows8andlater)❤️OpenPowerShell(NotCMD).Todothat,right-clickontheWindows......
  • usbserver工程师手记(四)ft2usbhub服务启动不了
     技术支持:可能他用360扫描或者驱动精灵的时候,把驱动给卸载了客户:老师确定下原因,用户那面着急用电脑了技术支持向日葵远程登录......技术支持:卸载客户端,重新安装一下就可以客户:我们要确定问题的原因那客户:用户那面允许我们继续排查客户领导:需要确定影响原因,不能是:可能、应该客......
  • 命令行中实现FTP文件上传与下载
    1、在命令行连接FTP服务器:ftpftp服务器url2、从FTP服务器下载文件:mget下载的文件名(ftp)Transfercomplete表示下载完成。注:eqpInfos.pdf文件必须在ftp服务器当前目录(dir命令可以查)eqpInfos.pdf会下载到本地的当前目录(lcd命令可查)lcd查看当前本地目录;dir查看ftp服务......
  • [Linux] Microsoft Teams 无法进行屏幕分享
    在Ubuntu22.04中,MicrosoftTeams无法进行屏幕分享的问题可能与桌面环境中的屏幕共享集成缺失有关。运行以下命令可以解决这个问题:sudoaptinstallxdg-desktop-portal-gnomexdg-desktop-portalxdg-desktop-portal是一个通用的桌面门户服务,它提供了一组标准接口,允许沙盒......
  • 【干货分享】Ftrans安全数据交换系统 搭建跨网数据传输通道
    安全数据交换系统是一种专门设计用于在不同的网络、系统或组织之间安全地传输数据的软件或硬件解决方案。这种系统通常包含多种安全特性,以确保数据在传输过程中的保密性、完整性和可用性。安全数据交换系统可以解决哪些问题?安全数据交换系统主要解决以下问题:数据泄露风险:通过加......
  • Ftrans文件摆渡系统:让数据流动更加自由和安全!
    为了保护核心数据,传统发电厂在进行网络隔离时,主要遵循电力行业的相关标准和规范,像逻辑隔离、物理隔离、安全分区等多种方式,以确保电力监控系统和电力调度数据网络的安全。在网络隔离后,内外部之间的办公协作、信息交互产生一定程度的障碍,需要文件摆渡系统进行内外网数据交互。以下......
  • MIT6.824 课程-Raft
    FaultTolerance-Raft容错模式我们已经学习了以下几种容错模式(fault-tolerancepattern):计算冗余:MapReduce,但是所有计算由单点Master进行调度。数据冗余:GFS,也是依赖单点Master来对多个副本进行选主。服务冗余:VMware-FT依赖单个TestAndSet操作可以看出他们都依赖单......