指令cache一致性

N2 also gets optional hardware instruction cache coherency. ARM recommends enabling it on systems with a lot of cores because broadcasting software-issued instruction cache invalidates would not be scalable. To implement instruction cache coherency, ARM makes the L2 cache inclusive of L1i contents. Then, I assume the L2 becomes exclusive of L1 data cache contents, ensuring that data writes will never cause a L1d hit when the address is in L1i. Finally, a read-for-ownership will evict a line from L2 caches in all other cores, which automatically causes L1i invalidates and prevents L1i caches from holding stale data.

ARM recommends configuring the core with 1 MB of L2. A 512 KB L2 would spend 1/8 of its capacity duplicating L1i contents to ensure L1i coherency, and a 256 KB L2 would be a really bad idea (likely why ARM doesn’t even allow it as an option). Thankfully, going to 1 MB of L2 capacity doesn’t cost any extra latency. Just as with A710’s 512 KB L2, getting data from L2 takes 13-14 cycles. Ampere Altra and Zen 4 have similar cycle counts for L2 accesses, though Zen 4 enjoys an actual latency advantage thanks to higher clock speeds.

‍

Strangely, code fetch bandwidth from L2 is worse than from L3. I wonder if Loongson ran into some difficulties when implementing hardware instruction cache coherency. If done correctly, hardware instruction cache coherency can benefit JIT-ed code and enable better scaling to high core counts. However, it’s not easy. Loongson’s L2 is non-inclusive, which means it can’t act as a snoop filter. Maybe a L2 hit from the instruction side has to probe the L1D to ensure it gets up-to-date data. But a L3 hit might benefit from separate coherency directory located in the L3 complex, which can indicate whether up-to-date data can be provided without snoops.

标签：instruction,cache,指令,L2,L1i,coherency,一致性,data
From： https://www.cnblogs.com/readdad/p/18099390/instruction-cache-consistency-nsubv

嵌入式笔记1.2 ARM Cortex-M3M4汇编指令集
目录Cortex-M处理器的指令集Cortex-M处理器支持的指令集Cortex-M处理器指令集的选择寄存器组详解1.通用寄存器R0~R122.栈指针3.连接寄存器4.程序计数寄存器5.程序状态字寄存器（xPSR）6.特殊功能寄存器7.浮点控制寄存器指令集详解（Cortex-M3和Cortex-M4都支持的）1.处理......
redis 数据库一致性策略
参考常见的缓存更新策略共有3种：CacheAside（旁路缓存）策略；Read/WriteThrough（读穿/写穿）策略；WriteBack（写回）策略；CacheAside（旁路缓存）策略CacheAside（旁路缓存）策略是最常用的，应用程序直接与「数据库、缓存」交互，并负责对缓存的维护，该策略又可以细分为「读策略」和「写策略」......
Python中Keras微调Google Gemma：定制化指令增强大型语言模型LLM
全文链接：https://tecdat.cn/?p=35476原文出处：拓端数据部落公众号像谷歌、Meta和Twitter这样的大公司正大力推动其大型语言模型（LLM）的开源。最近，谷歌DeepMind团队推出了Gemma——一个由与创建谷歌Gemini模型相同的研究和技术构建的轻量级、开源LLM系列。本文，我们将帮助客户了解Ge......
Redis发布订阅模式解决Guava Cache本地缓存刷新问题
为什么要用本地缓存可以加快资源访问速度，减少第三方IO延迟，也避免了网络调用的开销，将数据存储在本地jvm内存中可以减少外部系统的压力，可以将频繁访问、且更新场景较少的数据缓存起来，降低对远程服务或者数据库的请求次数，降低外部系统负载，提供系统整体的稳定性缺点:但是同时也得......
VUE3.0(一)：模板语法及指令介绍
模板语法Vue使用了基于HTML的模板语法，允许开发者声明式地将DOM绑定至底层Vue实例的数据。Vue的核心是一个允许你采用简洁的模板语法来声明式的将数据渲染进DOM的系统。结合响应系统，在应用状态改变时，Vue能够智能地计算出重新渲染组件的最小代价并应用到DOM......
【CMake】CMake从入门到实战系列（三）——CMake常用指令
文章目录一、out-of-source构建二、指令详解1、add_library【1】基本语法【2】参数含义【3】示例2、target_link_libraries【1】基本语法【2】参数含义【3】示例3、link_directories【1】基本语法【2】参数含义【3】示例4、include_directories【1】基本语法【2】参......
SQLServer一致性错误解决
工作中碰到的问题：执行存储过程，提示错误信息：“内部插叙处理器错误：查询处理器在执行过程中遇到意外错误”。初步怀疑是SQLServer中表“HS_Data”出现一致性错误或分配错误等原因造成。于是先用DBCCCHECKTABLE检查表“HS_Data”（DBCCCHECKTABLE用来检查组成表或索引视图......
【IT老齐061】BASE最终一致性
【IT老齐061】BASE最终一致性CAP理论下，常用的AP方案的补全手段BasicallyAvailable（基本可用）Softstate（软状态）Eventuallyconsistent（最终一致性）基本可用就是快速实现用户的基本价值与诉求，“创建订单”后立即返回就是基本可用的体现软状态代表业务操作，没有最终完成前的中......
【IT老齐062】缓存一致性
【IT老齐062】缓存一致性CacheAsidePattern禁止先删缓存，后更新数据库推荐先更新数据库，在删除缓存极端情况延迟双删......
工作中总结的30个常用Linux指令，实在记不住就别硬记了，看这篇就够了
写在开头最近发现自己记忆力严重下滑，很多sql命令，linux命令都记不住，特别是linux命令，很多命令参数很多，一段时间不用，再去使用就需要从网上重查了，很烦人，为此花了一些时间把之前笔记中的Linux命令给整理了一下，汇总出30个常用的分享出来，下次再想不起来直接看这篇文章就行了。1、Linux......

指令cache一致性

指令cache一致性

相关文章

赞助商

阅读排行