首页 > 其他分享 >[转帖]Evaluating Garnet's Performance Benefits

[转帖]Evaluating Garnet's Performance Benefits

时间:2024-03-21 13:57:21浏览次数:27  
标签:Benefits -- Garnet number 转帖 keys client size

Evaluating Garnet's Performance Benefits
Evaluating Garnet's Performance Benefits | Garnet (microsoft.github.io)

 

We have tested Garnet thoroughly in a variety of deployment modes:

  • Same local machine for client and server
  • Two local machines - one client and one server
  • Azure Windows machines
  • Azure Linux machines

Below, we focus on a selected few key results.

Setup

We provision two Azure Standard F72s v2 virtual machines (72 vcpus, 144 GiB memory each) running Linux (Ubuntu 20.04), with accelerated TCP enabled. The benefit of this SKU is that we are guaranteed not to be co-located with another VM, which will optimize the performance. One machine runs different cache-store servers, and the other is dedicated to issuing workloads. We use our benchmarking tool, called Resp.benchmark, to generate all results. We compare Garnet to the latest open-source versions of Redis (v7.2), KeyDB (v6.3.4), and Dragonfly (v6.2.11) at the time of writing. We use a uniform random distribution of keys in these experiments (Garnet’s shared memory design benefits even more with skewed workloads). All data fits in memory in these experiments. The baseline systems were tuned and optimized as much as possible based on available information. Below, we summarize for each system the startup configuration used for our experiments.

Garnet
      dotnet run -c Release --framework=net8.0 --project Garnet/main/GarnetServer -- \
--bind $host \
--port $port \
--no-pubsub \
--no-obj \
--index 1g
  Redis 7.2
      ./redis-server \
--bind $host \
--port $port \
--logfile "" \
--save "" \
--appendonly no \
--protected-mode no \
--io-threads 32
  KeyDB 6.3.4Dragonfly 6.2.11

Basic Commands Performance

We measured throughput and latency for basic GET/SET operations by varying payload size, batch size, and number of client threads. For our throughput experiments, we preload a small DB (1024 keys) and a large DB (256M keys) into Garnet before running the actual workload. In contrast, our latency experiments were performed on an empty database and for a combined workload of GET/SET commands that operate on a small keyspace (1024 keys).

Throughput GET

For the experiment depicted in Figure 1., we used large batches of GET operations (4096 requests per batch) and small payloads (8-byte keys and values) to minimize network overhead. As we increase the number of client sesssions, we observe that Garnet exhibits better scalability than Redis or KeyDB. Dragonfly exhibits similar scaling characteristics though only up to 16 threads. Note also, that DragonFly is a pure in-memory system. Overall, Garnet's throughput relative to the other systems is consistently higher even when the database size (i.e., the number of distinct keys pre-loaded) is larger (at 256 million keys) than the size of the processor cache.

Varying number of client sessions or batchsize (GET)
tpt-get-threads.png
Figure 1: Throughput (log-scale), varying number of client sessions, for a database size of (a) 1024 keys, and (b) 256 million keys

Even for small batch sizes Garnet outperforms the competing systems by attaining a consistently a higher throughput, as indicated by Figure 2. This happens irrespective of the actual database size.

tpt-get-batchsize.png.png
Figure 2: Throughput (log-scale), varying batch sizes, for a database size of (a) 1024 keys, and (b) 256 million keys

Latency GET/SET

We next measure client-side latency for various systems, by issuing a mixture of 80% GET and 20% SET requests, and compare it against Garnet. Since we care about latency, our DB size is kept small while we vary other parameters of the workload such as client threads, batch size, and payload size.

Figure 3, showcase that as increase the number of client sessions, Garnet's latency (measured in microseconds) and across various percentiles is consistently lower and more stable compared to other systems. Note this experiment does not utilize batching.

Latency benchmark varying client sessions or batchsize (GET/SET)
    dotnet run -c Release --framework=net8.0 --project Garnet/benchmark/Resp.benchmark
--host $host \
--port $port \
--batchsize 1 \
--threads $threads \
--client GarnetClientSession \
--runtime 35 \
--op-workload GET,SET \
--op-percent 80,20 \
--online \
--valuelength $valuelength \
--keylength $keylength \
--dbsize 1024 \
--itp $batchsize
 
lat-get-set-threads.png
Figure 3: Latency, varying number of client sessions, at (a) median, (b) 99th percentile, and (c) 99.9th percentile

Garnet’s latency is fine-tuned for adaptive client-side batching and efficiently handling multiple sessions querying the system. For our next set of experiments, we increase the batch sizes from 1 to 64 and plot latency at different percentiles below with 128 active client connections. As illustrated in Figure 4, Garnet maintains stability and achieves lower overall latency compared to other systems when the batch size is increased.

lat-get-set-batchsize.png
Figure 4: Latency, varying batch sizes, at (a) median, (b) 99th percentile, and (c) 99.9th percentile

Complex Data Structures Performance

Garnet supports a vast number of different complex data structures such as Hyperloglog, Bitmap, Sorted Sets, Lists etc. Below, we present performance metrics for a select few of them.

Hyperloglog

Garnet supports its own built-in Hyperloglog (HLL) data structure. This is implemented using C# and it supports operations such as update (PFADD), compute an estimate (PFCOUNT) and merge (PFMERGE) two or more distinct HLL structures. HLL data structures are often optimized in terms of their memory footprint. Our implementation is no different, utilizing a sparse representation when the number of nonzero counts is low and a dense representation beyond a given fixed threshold for which the trade-off between memory savings and the additional work performed for decompression is no longer attractive. Enabling efficient updates to the HyperLogLog (HLL) structure is essential for concurrent systems, such as Garnet. For this reason, our experiments forcus specifically on the performance of PFADD and are deliberately designed to stress test our system for the following scenarios:

  1. Large number of high contention updates (i.e. batchsize 4096, DB of 1024 keys) for increasing number of threads or increasing payload size After a few insertions, the constructed HyperLogLog (HLL) structures will transition to utilizing the dense representation.
  2. Large number of low contention updates (i.e. batchsize 4096, DB of 256M keys) for increasing number of threads or increasing payload size This adjustment will increase the likelihood that the constructed HyperLogLog (HLL) structures utilize the sparse representation. Consequently, our measurements will consider the added overhead of working with compressed data or incrementally allocating more space for non-zero values

In Figure 5, we present the results for first experimental scenario. Garnet scales very well under high contention and consistently outperforms every other system in terms of raw throughput for increasing number of threads. Similarly, for increasing payload size Garnet exhibits higher total throughput compared to other systems. Across all tested systems, we noticed a noticeable decrease in throughput as the payload size increased. This behavior is anticipated due to the inherent TCP network bottleneck.

Varying number of client sessions or payload size while operating on few keys
tpt-pfadd-few-keys.png
Figure 5: Throughput (log-scale), for (a) increasing number of client sessions, and (b) increasing payload size, for a database size of 1024 keys.

Figure 6 shows the results for the second experimental scenario as described above. Even while operating on the HLL sparse representation, Garnet performs better that any other system and achieves consistently higher throughput while scaling very well for increasing numbers of client sessions. Similarly, for increasing payload size, Garnet outperforms the competition by scoring overall higher throughput. Notice in both cases the throughput is lower compared to the previous experiment due to the overhead of operating on compressed data.

Varying number of client sessions or payload size while operating on many keys (PFADD)
tpt-pfadd-few-keys.png
Figure 6: Throughput (log-scale), for (a) increasing number of client sessions, and (b) increasing payload size, for a database size of 1M keys.

In Figure 7, we perform the same type of experiment as previously stated fixing the number of client sessions to 64, and the payload to 128 bytes while increasing the batchsize. Note that even for batch size of 4, Garnet's throughput gains are noticeably higher than any other system that we tested. This demonstrates that even for small batch size we still outperform the competing systems.

tpt-pfadd-batchsize.png
Figure 7: Throughput (log-scale), for increasing batchsize by 64 client sessions on a DB with (a) 1024 keys, (b) 1M keys.

Bitmap

Garnet supports a set of bit-oriented operators on string data types. These operators can be processed in constant time (i.e. GETBIT, SETBIT) or linear time (i.e. BITCOUNT, BITPOS, BITOP). To speedup processing, for the linear time operators, we used hardware and SIMD instructions. Below we present the benchmark results for a subset of these operators, covering both complexity categories. Similarly to before we use a small DB size (1024 keys) to evaluate the performance of each system under high contention while avoiding having all the data resident in CPU cache by increasing the paylaod size (1MB).

In Figure 8, we present the performance metrics for GETBIT and SETBIT commands. In both cases, Garnet consistently maintains higher throughput and better scalability as the number of client sessions increase.

Varying number of client sessions (GETBIT/SETBIT/BITOP_NOT/BITOP_AND)
tpt-getbit-setbit-threads.png
Figure 8: Throughput (log-scale), varying number of client sessions, for a database size of 1024 keys and 1MB payload.

In Figure 9, we evaluate the performance of BITOP NOT and BITOP AND (with two source keys) for increasing number of threads and a payload size of 1MB. Garnet maintains overall higher throughput as they number of clieant session increase compared to every other system we tested. It all also performs very well under high contention given that our DB size is relatively small (i.e. only 1024 keys).

tpt-bitop-threads.png
Figure 9: Throughput (log-scale), varying number of client sessions, for a database size of 1024 keys and 1MB payload.

As show in Figures 10 and 11, even for small batch sizes Garnet attain higher throughput that any other system that we tested on the associated bitmap operations. In fact, it does not require much to observe a noticeable difference in throughput since even at batchsize of 4 Garnet is signficantly faster.

Varying batch size (GETBIT/SETBIT/BITOP_NOT/BITOP_AND)
tpt-bitop-batchsize.png
Figure 10: Throughput (log-scale), for increasing batchsize by 64 client sessions on a DB with 1024 keys and 1MB payload.
tpt-bitop-batchsize.png
Figure 11: Throughput (log-scale), for increasing batchsize by 64 client sessions on a DB with 1024 keys and 1MB payload.

标签:Benefits,--,Garnet,number,转帖,keys,client,size
From: https://www.cnblogs.com/jinanxiaolaohu/p/18087211

相关文章

  • 微软的Garnet的安装学习以及与Redis的简单对比
    微软的Garnet的安装学习以及与Redis的简单对比安装方式官网上面其实没有写如何安装garnet的很多人见识用nuget的方式进行安装我这边简单尝试了下也没看出来怎么用exe没办法只能学习dockerfile里面的内容在windows上面进行编译.下载地址https://codeload.github.com......
  • [转帖]JVM优化之调整大内存分页(LargePage)
    https://nowjava.com/article/31311 在这篇文章中:内存分页大小对性能的提升原理调整OS和JVM内存分页cat/proc/meminfo|grepHugeecho4294967295>/proc/sys/kernel/shmmaxecho154>/proc/sys/vm/nr_hugepages本文将从内存分页的原理,如何调整分页大小两节......
  • [转]Garnet: 力压Redis的C#高性能分布式存储数据库
    今天看到微软研究院开源了一个新的C#项目,叫Garnet,它实现了Redis协议,可以直接将Redis替换为Garnet,客户端不需要任何修改。根据其官网的信息,简单的介绍一下它。开源仓库地址:https://github.com/microsoft/garnet文档地址:https://microsoft.github.io/garnet/Garnet是微软研究院基......
  • Garnet: 力压Redis的C#高性能分布式存储数据库
    今天看到微软研究院开源了一个新的C#项目,叫Garnet,它实现了Redis协议,可以直接将Redis替换为Garnet,客户端不需要任何修改。根据其官网的信息,简单的介绍一下它。开源仓库地址:https://github.com/microsoft/garnet文档地址:https://microsoft.github.io/garnet/Garnet是微软研究院基......
  • Garnet发布 Redis不再是唯一选择
    Garnet是MicrosoftResearch的远程缓存存储,提供强大的性能(吞吐量和延迟)、可扩展性、存储、恢复、集群分片、密钥迁移和复制功能。Garnet可以与现有的Redis客户端配合使用。Garnet是MicrosoftResearch推出的一种新型远程缓存存储,具有多种独特优势:Garnet采用流行的......
  • [转帖]关于mysql的时区(下):如何设置mysql的时区
    一、如何设置mysql时区1、命令1)查时区:showvariableslike'%time_zone%'返回有2行记录,要看time_zone变量的值,不需要看system_time_zone。若值为SYSTEM表示取值跟system_time_zone保持一致。system_time_zone的值是启动mysql服务的时候读取了操作系统的值,除非重新启......
  • [转帖]Oracle 常规坏块处理方法
    收到业务反馈,查看erp请求时遇到报错,一看居然是坏块。。。-_-|| alert日志中也出现相关报错,但还好只有一个坏块一、有备份的处理方法这一般就非常简单,rman有坏块修复功能Recoverdatafile19block44;如有必要,可同时修复多个文件多个块Recoverdatafile19block44d......
  • [转帖]linux-windows文件实时同步:Rsync使用教程
    http://luomuren.top/articles/2021/04/06/1617641017252.html#:~:text=linux-windows%E6%96%87%E4%BB%B6%E5%AE%9E%E6%97%B6%E5%90%8C%E6%AD%A5%EF%BC%9ARsync%E4%BD%BF%E7%94%A8%E6%95%99%E7%A8%8B%201%20%E4%B8%80%20%E3%80%81%E4%BB%80%E4%B9%88%E6%98%AFrync%20%EF%BC%9F......
  • [转帖]癌症分为几个时期,为什么要分期?
    https://zhuanlan.zhihu.com/p/64879832 医生说的癌症三期、四期是什么意思?癌症一共有几个时期,每一期是如何定义的,为什么要分期? 作者说:癌症分期的概念在癌症研究和治疗方面非常重要。这篇内容偏知识介绍,比较枯燥。太长不看的话,就是癌症有系统的方法分为几个不同的时......
  • [转帖]比较不同CPU下的分支预测
    https://plantegg.github.io/2023/04/16/%E6%AF%94%E8%BE%83%E4%B8%8D%E5%90%8CCPU%E4%B8%8B%E7%9A%84%E5%88%86%E6%94%AF%E9%A2%84%E6%B5%8B/ 目的本文通过一段对分支预测是否友好的代码来验证branchloadmiss差异,已经最终带来的性能差异。同时在x86和aarch64下各选几......