首页 > 其他分享 >llama benchmarks

llama benchmarks

时间:2023-12-24 16:00:12浏览次数:26  
标签:shot benchmarks al huggingface llama address report et

Introduction

Here we re-evaluate llama2 benchmarks to prove its performence.

datasets

In this blog, we'll test the following datasets shown in the images.

image

image
from here you can find the dataset

  • Code. We report the average pass@1 scores of our models on HumanEval (Chen et al., 2021) and MBPP (Austin et al., 2021).
  • Commonsense Reasoning. We report the average of PIQA (Bisk et al., 2020), SIQA (Sap et al., 2019), HellaSwag (Zellers et al., 2019a), WinoGrande (Sakaguchi et al., 2021), ARC easy and challenge (Clark et al., 2018), OpenBookQA (Mihaylov et al., 2018), and CommonsenseQA (Talmor et al., 2018). We report 7-shot results for CommonSenseQA and 0-shot results for all other benchmarks.
  • World Knowledge. We evaluate the 5-shot performance on NaturalQuestions (Kwiatkowski et al., 2019) and TriviaQA (Joshi et al., 2017) and report the average.
  • Reading Comprehension. For reading comprehension, we report the 0-shot average on SQuAD (Rajpurkar et al., 2018), QuAC (Choi et al., 2018), and BoolQ (Clark et al., 2019).
  • MATH. We report the average of the GSM8K (8 shot) (Cobbe et al., 2021) and MATH (4 shot) (Hendrycks et al., 2021) benchmarks at top 1.

mmlu: address
TriviaQA: huggingface address1 huggingface address2
GSM8K: huggingface address
HumanEval: huggingface address
BIG-Bench Hard: huggingface address
Hella-Swag: huggingface address
NQ(natural question): github address huggingface address
MBPP: huggingface address
PIQA: huggingface address
SIQA: huggingface address
ARC: huggingface address
WinoGrande: huggingface address
OpenBookQA: huggingface address
CommonsenseQA: huggingface address
SQuAD: huggingface address SQuADv2
QuAC: huggingface address
BooIQ: huggingface address

标签:shot,benchmarks,al,huggingface,llama,address,report,et
From: https://www.cnblogs.com/ldzbky/p/17924469.html

相关文章

  • 中文TigerBot-70B大模型:领先Llama-2,全球开源新标杆,300B数据驱动
    引言随着大型语言模型(LLM)在自然语言处理领域的日益重要,新型多语言多任务模型——TigerBot-70B的问世,标志着全球范围内一个新的技术里程碑的达成。TigerBot-70B不仅在性能上匹敌行业巨头如OpenAI的模型,而且其创新算法和数据处理方式在行业内引起广泛关注。Huggingface模型下载:https:......
  • llama大模型部署
    看模型加载的参数设置.importtorch#初始化HalfTensorh=torch.tensor([1.0,2.0,3.0],dtype=torch.half)#h=torch.tensor([1.0,2.0,3.0],dtype=torch.float16)#跟上面一行一样.#查看数据类型print(h.dtype)importaccelerateimportbitsandbytesfromtran......
  • llama的hf源码结构
    上一个博客我们看了rope.那么llama的hidden_states就没啥难点了.再整体把握一下hf里面llama的代码结构.文件是:D:\Users\admin\miniconda3\Lib\site-packages\transformers\models\llama\modeling_llama.py基座:classLlamaModel(LlamaPreTrainedModel):这个模型输入bs,se......
  • 国产DeepSeek Coder 33B开源:创新代码AI,性能优于CodeLlama
    引言近日,国产AI领域迎来了一项重大突破:DeepSeek团队正式发布了DeepSeekCoder33B模型,这一基于最新人工智能技术的代码生成模型不仅完全开源,而且在多项评测中显示出优于同类产品CodeLlama的卓越性能。Huggingface模型下载:https://huggingface.co/deepseek-aiAI快站模型免费加速下......
  • llama的rope源码阅读
    关键代码的理解:classLlamaRMSNorm(nn.Module):def__init__(self,hidden_size,eps=1e-6):"""LlamaRMSNormisequivalenttoT5LayerNorm"""super().__init__()self.weight=nn.Parameter(to......
  • 国产670亿参数的DeepSeek:超越Llama2,全面开源
    模型概述DeepSeek,一款国产大型语言模型(LLM),凭借其670亿参数的规模,正引领着人工智能领域的新浪潮。这款模型不仅在多项中英文公开评测榜单上超越了700亿参数的Llama2,而且在推理、数学和编程能力方面表现突出。最引人注目的是,DeepSeek在匈牙利最新高中数学考试中获得了65分的高分,显示......
  • LLAMA预训练:大模型的潜力与挑战
    随着人工智能技术的不断发展,大型深度学习模型在各个领域的应用越来越广泛。其中,Bloom和LLAMA(LargeLanguageModelfromOuterSpace)两个大模型备受瞩目。这些模型在预训练阶段具有许多共同点,本文将重点介绍它们的预训练方法。一、预训练目标大型深度学习模型的预训练目标是通过大......
  • llama-factory fine-tuning 4 (mixtral fine-tuning)
    introductionfine-tuningcommandclicktoviewthecodeCUDA_VISIBLE_DEVICES=0pythonsrc/train_bash.py\--stagesft\--do_train\--model_name_or_path../Mixtral-8x7B-v0.1/\--datasetalpaca_en\--templatemistral\--q......
  • 大模型那么火,教你一键ModelArts玩转开源LlaMA大模型
     本文分享自华为云社区《大模型那么火,教你一键Modelarts玩转开源LlaMA(羊驼)大模型》,作者:码上开花_Lancer。近日,LlaMA(羊驼)这个大模型再次冲上热搜!LLaMA(LargeLanguageModelMetaAI),由MetaAI发布的一个开放且高效的大型基础语言模型,共有7B、13B、33B、65B(650亿)四种版本。......
  • 大模型那么火,教你一键Modelarts玩转开源LlaMA(羊驼)大模型
    本文分享自华为云社区《大模型那么火,教你一键Modelarts玩转开源LlaMA(羊驼)大模型》,作者:码上开花_Lancer。近日, LlaMA(羊驼)这个大模型再次冲上热搜!LLaMA(LargeLanguageModelMetaAI),由MetaAI发布的一个开放且高效的大型基础语言模型,共有7B、13B、33B、65B(650亿)四种版本......