首页 > 数据库 >mongodb压缩——snappy、zlib块压缩,btree索引前缀压缩

mongodb压缩——snappy、zlib块压缩,btree索引前缀压缩

时间:2023-05-31 16:05:16浏览次数:58  
标签:engine storage compression mongodb 压缩 zlib WiredTiger 3.0 data

MongoDB 3.0 WiredTiger Compression and Performance

One of the most exciting developments over the lifetime of MongoDB must be the inclusion of the WiredTiger storage engine in MongoDB 3.0. Its very design and core architecture are legions ahead of the current MMAPv1 engine and comparable to most modern day storage engines for various relational and non-relational stores. One of the most compelling features of the WiredTiger storage engine is compression. Let's talk a bit more about performance and compression.

Configuration

MongoDB 3.0 allows the user to configure different storage engines through the storage engine API. For the first time ever we have an amazing array of options for setting up MongoDB to match our workloads and use-cases. To run WiredTiger the version must be 3.0 or higher and the configuration file must call for WiredTiger. For example:

storage:
   dbPath: "/data/mongodb"
   journal:
       enabled: true
   engine: "wiredTiger"
   wiredTiger:
       engineConfig:
           cacheSizeGB: 99
           journalCompressor: none
           directoryForIndexes: "/indexes/mongodb/"
       collectionConfig:
           blockCompressor: snappy
       indexConfig:
           prefixCompression: true
systemLog:
   destination: file
   path: "/tmp/mongodb.log"
   logAppend: true
processManagement:
   fork: true
net:
   port: 9005
   unixDomainSocket:
       enabled : true

There are a lot of new configuration options in 3.0 so let's take the notable options one by one.

  • storage.engine. The setting ensures we are using the WiredTiger storage engine. Should be set to "wiredTiger" to use the WiredTiger engine. It can also be set to "mmapv1". MMAPv1 is the default in 3.0, but in MongoDB 3.1 (potentially) this will change to wiredTiger.
  • storage.wiredTiger.engineConfig.cacheSizeGB. This sets up a page cache for WiredTiger to cache frequently used data and index blocks in GB. If this is not specified, MongoDB will automatically assign memory up to about 50% of total addressable memory.
  • storage.wiredTiger.engineConfig.directoryForIndexes. Yes! We can now store indexes on a separate block device. This should help DBAs size, capacity plan, and augment performance as needed.
  • storage.wiredTiger.collectionConfig.blockCompressor. This can be set to 'snappy' or 'zlib'. Snappy having higher performance and lower compression than zlib. Some more detail later on compression algorithms.
  • storage.wiredTiger.indexConfig.prefixCompression. This setting enables prefix compression for indexes. Valid options are true|false and the default is true.

Let's talk performance

WiredTiger is going to be much faster than MMAPv1 for almost all workloads. Its real sweet spot is highly concurrent and/or workloads with lots of updates. This may surprise some folks because traditionally compression is a trade off. Add compression, lose performance. That is normally true, but a couple of things need to be considered here. One, we are comparing the MMAPv1 engine with database level locking to WiredTiger with document level locking. Any reasonable concurrent workload is almost always bound by locking and seldom by pure system level resources. Two, WiredTiger does page level compression. More on this later.

There are a few things that make WiredTiger faster other than its locking scope. WiredTiger also has a streamlined process for free space lookups and management and it has a proper cache with its own I/O components.

Because WiredTiger allows for compression, a common worry is the potential for overall performance impact. But as you can see, in a practical sense this worry is mostly unfounded.

A couple graphs for relative performance difference for sysbench-mongodb. It should be noted that WiredTiger is using defaults in this configuration, including snappy compression and index prefix compression.

Let's break it down a bit more:

The relative CPU usage for each:

Let's talk more about compression

Compressing data inside a database is tricky. WiredTiger does a great job at handling compression because of its sophisticated management approach:

The cache generally stores uncompressed changes (the exception is for very large documents). The default snappy compression is fairly straightforward: it gathers data up to a maximum of 32KB, compresses it, and if compression is successful, writes the block rounded up to the nearest 4KB.

The alternative zlib compression works a little differently: it will gather more data and compress enough to fill a 32KB block on disk. This is more CPU intensive but generally results in better compression ratios (independent of the inherent differences between snappy and zlib).

—Michael Cahill

This approach is great for performance. But compression still has overhead and can vary in effectiveness. What this means for users is two-fold:

  • Not all data sets compress equally, it depends on the data format itself.
  • Data compression is temporal. One day being better than another depending on the specific workload.

One approach is to take a mongodump of the dataset in question then mongorestore that data to a compressed WiredTiger database and measure the difference. This gives a rough measurement of what one can expect the compression ratio to be. That said, as soon as the new compressed database starts taking load, that compression ratio may vary. Probably not by a massive margin however.

It should be noted there are some tricky bits to consider when running a database using compression. Because WiredTiger compresses each page before it hits the disk the memory region is uncompressed. This means that highly compressed data will have a large ratio between its footprint on disk and the cache that serves it. Poorly compressed data the opposite. The effect may be the database becomes slow. It will be hard to know that the problem is the caching pattern has changed because the compression properties of the underlaying data have changed. Keeping good time series data on the cache utilization, and periodically checking the compression of the data by hand may help the DBA understand these patterns better.

For instance, note the different compression ratios of various datasets:

Take Aways

  • MongoDB 3.0 has a new storage engine API, and is delivered with the optional WiredTiger engine.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 mostly because of increased concurrency.
  • MongoDB 3.0 with WiredTiger is much faster than MMAPv1 even when compressing the data.

Lastly, remember, MongoDB 3.0 is a new piece of software. Test before moving production workloads to it. TEST TEST TEST.

If you would like to test MongoDB 3.0 with WiredTiger, ObjectRocket has it as generally available and it's simple and quick to setup. As with anything ObjectRocket, there are a team of DBAs and Developers to help you with your projects. Don't be shy hitting them up at [email protected] with questions or email me directly.

Note: test configuration and details documented here.

标签:engine,storage,compression,mongodb,压缩,zlib,WiredTiger,3.0,data
From: https://blog.51cto.com/u_11908275/6387680

相关文章

  • 多文件下载到压缩包
    /***多个图片下载到zip*/privatevoidmultiDownload(List<DownloadFileReqVo>fileReqVoList,HttpServletResponseresp)throwsBaseAppException{try{//创建临时文件FilezipFile=File.createTempFile("down......
  • lucene LZ4 会将doc存储在一个chunk里进行Lz4压缩 ES的_source便如此
    默认情况下,Elasticsearch用JSON字符串来表示文档主体保存在 _source 字段中。像其他保存的字段一样,_source 字段也会在写入硬盘前压缩。The_sourceisstoredasabinaryblob(whichiscompressedbyLucenewithdeflateorLZ4)其实就是多个_source合并到一个chunk里......
  • node连接mongodb
    主要试用了两个库:mongodb、mongoose由于服务器使用的是比较老版本的mongodb,如果使用比较新的客户端,编译时会出现问题:Serveratxxxx:27017reportsmaximumwireversion4,butthisversionoftheNode.jsDriverrequiresatleast6(MongoDB3.6)终级解决方案:去掉类型constm......
  • 字符串解压缩问题——贪心算法
     importsysdefload_data():returnsys.stdin.read()defget_position_map(s):result={}stack=[]fori,cinenumerate(s):ifc=="[":result[i]=-1stack.append(i)elifc=="......
  • 第三代DNA测序数据压缩方法研究
    第三代DNA测序数据压缩方法研究崔浩翔深圳大学摘要:第三代测序技术自问世以来在临床分子诊断中扮演着越来越重要的角色,尤其在基因组测序、甲基化研究、突变鉴定(SNP检测)等方面。测序技术的不断发展使得测序成本逐年下降,测序数据量急剧增加,如何存储和传输庞大的测序数据是......
  • MongoDB C++ gridfs worked example
    使用libmongoc,参考:http://mongoc.org/libmongoc/current/mongoc_gridfs_t.html#include<mongoc.h>#include<stdio.h>#include<stdlib.h>#include<fcntl.h>classMongoGridFS{public:MongoGridFS(constchar*db);~MongoGridFS();......
  • mongodb c++ driver安装踩坑记
     安装教程:https://mongodb.github.io/mongo-cxx-driver/mongocxx-v3/installation/(1)“initializer_list”filenotfoundhttp://stackoverflow.com/questions/19493671/initializer-list-no-such-file-or-directorySinceyouareusing GCC-4.8 andyourproblemisthatyoud......
  • Unity发布IOS发布Android版本出现屏幕问题 UGUI半屏被压缩 另一半黑屏
    项目场景:用Unity做的app发布的ios和Android版本,ui做屏幕自适应,来适配多机型,unity版本是2019.4,用的UGUI。问题描述:极个别机型有个偶发的问题,就是在app息屏,再开屏的时候,会出现半边屏幕被压缩,半边屏幕黑屏的问题,但是ui交互的位置还是正常的,bug效果图如下:跟这张图一样的<hrstyle="bor......
  • .net压缩文件(System.IO.Compression.ZipFile)
    NuGet安装System.IO.Compression.ZipFile,注意不是System.IO.Compression优点:不同于ICSharpCode.SharpZipLib.dll的地方是,这个插件可以直接压缩文件夹,文件夹内的文件自动压缩进去了,ICSharpCode.SharpZipLib.dll需要一个一个将文件添加进压缩包,不能直接压缩文件夹1ZipFile.Creat......
  • Java实现打包压缩文件或文件夹生成zip以实现多文件批量下载
    有时候在系统中需要一次性下载多个文件,但逐个下载文件比较麻烦。这时候,最好的解决办法是将所有文件打包成一个压缩文件,然后下载这个压缩文件,这样就可以一次性获取所有所需的文件了。下面是一个名为CompressUtil的工具类的代码,它提供了一些方法来处理文件压缩和下载操作:importor......