首页 > 其他分享 >指令cache一致性

指令cache一致性

时间:2024-03-27 15:22:20浏览次数:25  
标签:instruction cache 指令 L2 L1i coherency 一致性 data

指令cache一致性

image

N2 also gets optional hardware instruction cache coherency. ARM recommends enabling it on systems with a lot of cores because broadcasting software-issued instruction cache invalidates would not be scalable. To implement instruction cache coherency, ARM makes the L2 cache inclusive of L1i contents. Then, I assume the L2 becomes exclusive of L1 data cache contents, ensuring that data writes will never cause a L1d hit when the address is in L1i. Finally, a read-for-ownership will evict a line from L2 caches in all other cores, which automatically causes L1i invalidates and prevents L1i caches from holding stale data.

ARM recommends configuring the core with 1 MB of L2. A 512 KB L2 would spend 1/8 of its capacity duplicating L1i contents to ensure L1i coherency, and a 256 KB L2 would be a really bad idea (likely why ARM doesn’t even allow it as an option). Thankfully, going to 1 MB of L2 capacity doesn’t cost any extra latency. Just as with A710’s 512 KB L2, getting data from L2 takes 13-14 cycles. Ampere Altra and Zen 4 have similar cycle counts for L2 accesses, though Zen 4 enjoys an actual latency advantage thanks to higher clock speeds.

Strangely, code fetch bandwidth from L2 is worse than from L3. I wonder if Loongson ran into some difficulties when implementing hardware instruction cache coherency. If done correctly, hardware instruction cache coherency can benefit JIT-ed code and enable better scaling to high core counts. However, it’s not easy. Loongson’s L2 is non-inclusive, which means it can’t act as a snoop filter. Maybe a L2 hit from the instruction side has to probe the L1D to ensure it gets up-to-date data. But a L3 hit might benefit from separate coherency directory located in the L3 complex, which can indicate whether up-to-date data can be provided without snoops.

标签:instruction,cache,指令,L2,L1i,coherency,一致性,data
From: https://www.cnblogs.com/readdad/p/18099390/instruction-cache-consistency-nsubv

相关文章

  • 嵌入式笔记1.2 ARM Cortex-M3M4汇编指令集
    目录Cortex-M处理器的指令集Cortex-M处理器支持的指令集Cortex-M处理器指令集的选择寄存器组详解1.通用寄存器R0~R122.栈指针3.连接寄存器4.程序计数寄存器5.程序状态字寄存器(xPSR)6.特殊功能寄存器7.浮点控制寄存器指令集详解(Cortex-M3和Cortex-M4都支持的)1.处理......
  • redis 数据库一致性策略
    参考常见的缓存更新策略共有3种:CacheAside(旁路缓存)策略;Read/WriteThrough(读穿/写穿)策略;WriteBack(写回)策略;CacheAside(旁路缓存)策略CacheAside(旁路缓存)策略是最常用的,应用程序直接与「数据库、缓存」交互,并负责对缓存的维护,该策略又可以细分为「读策略」和「写策略」......
  • Python中Keras微调Google Gemma:定制化指令增强大型语言模型LLM
    全文链接:https://tecdat.cn/?p=35476原文出处:拓端数据部落公众号像谷歌、Meta和Twitter这样的大公司正大力推动其大型语言模型(LLM)的开源。最近,谷歌DeepMind团队推出了Gemma——一个由与创建谷歌Gemini模型相同的研究和技术构建的轻量级、开源LLM系列。本文,我们将帮助客户了解Ge......
  • Redis发布订阅模式解决Guava Cache本地缓存刷新问题
    为什么要用本地缓存可以加快资源访问速度,减少第三方IO延迟,也避免了网络调用的开销,将数据存储在本地jvm内存中可以减少外部系统的压力,可以将频繁访问、且更新场景较少的数据缓存起来,降低对远程服务或者数据库的请求次数,降低外部系统负载,提供系统整体的稳定性缺点:但是同时也得......
  • VUE3.0(一):模板语法及指令介绍
    模板语法Vue使用了基于HTML的模板语法,允许开发者声明式地将DOM绑定至底层Vue实例的数据。Vue的核心是一个允许你采用简洁的模板语法来声明式的将数据渲染进DOM的系统。结合响应系统,在应用状态改变时,Vue能够智能地计算出重新渲染组件的最小代价并应用到DOM......
  • 【CMake】CMake从入门到实战系列(三)——CMake常用指令
    文章目录一、out-of-source构建二、指令详解1、add_library【1】基本语法【2】参数含义【3】示例2、target_link_libraries【1】基本语法【2】参数含义【3】示例3、link_directories【1】基本语法【2】参数含义【3】示例4、include_directories【1】基本语法【2】参......
  • SQLServer一致性错误解决
    工作中碰到的问题: 执行存储过程,提示错误信息:“内部插叙处理器错误:查询处理器在执行过程中遇到意外错误”。 初步怀疑是SQLServer中表“HS_Data”出现一致性错误或分配错误等原因造成。于是先用DBCCCHECKTABLE检查表“HS_Data”(DBCCCHECKTABLE用来检查组成表或索引视图......
  • 【IT老齐061】BASE最终一致性
    【IT老齐061】BASE最终一致性CAP理论下,常用的AP方案的补全手段BasicallyAvailable(基本可用)Softstate(软状态)Eventuallyconsistent(最终一致性)基本可用就是快速实现用户的基本价值与诉求,“创建订单”后立即返回就是基本可用的体现软状态代表业务操作,没有最终完成前的中......
  • 【IT老齐062】缓存一致性
    【IT老齐062】缓存一致性CacheAsidePattern禁止先删缓存,后更新数据库推荐先更新数据库,在删除缓存极端情况延迟双删......
  • 工作中总结的30个常用Linux指令,实在记不住就别硬记了,看这篇就够了
    写在开头最近发现自己记忆力严重下滑,很多sql命令,linux命令都记不住,特别是linux命令,很多命令参数很多,一段时间不用,再去使用就需要从网上重查了,很烦人,为此花了一些时间把之前笔记中的Linux命令给整理了一下,汇总出30个常用的分享出来,下次再想不起来直接看这篇文章就行了。1、Linux......