首页 > 其他分享 >google三驾马车之一:Bigtable解读(英文版)


时间:2023-10-16 20:00:43浏览次数:48  
标签:memtable google tablet distributed SSTable master 英文版 Bigtable


Bigtable: A Distributed Storage System for Structured Data

Data model: not a relational data model

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. —— part2

How the map indexed?

(row:string, column:string, time:int64) → string

just like json format, eg:

  // ...  
  "aaaaa" : { //row
    "A:foo" : { //col
        15 : "y", //timestamp 
        4 : "m"  
    "A:bar" : { //col
        15 : "d",  
    "B:" : { //col
        6 : "w"  
        3 : "o"  
        1 : "w"  
  // ...  

a particular table: webtable

  • row(also called tablet): reversed URL

    concurrent: single row key is atomic
    lexicographic order

  • col: column families, contents

    Access control and both disk and memory accounting

  • timestamp

    avoid collisions: unique timestamp, decreasing order
    garbage-collection mechanism(eg.)


C++ read/write

MapReduce + Bigtable

Building Block

Google File System: store log and data files

distributed Google File System

Google SSTable file format: store Bigtable data

K-V map: iterate key/value pairs in a specified key range

  • a sequence of blocks
  • a block index
disk seek or memory seek?

Optionally, SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching disk.

Chubby: distributed lock service

5 active replicas: 1 master, 4 slave

Paxos algorithm: to keep its replicas consistent in the face of failure

namespace: including directory and small file, op r/w is atomic

session: when expires, lose locks and open handles



  1. library(?) linked to every client
  2. 1 master server(schedule, garbage-collect......)
  3. many tablet server(10-1000 tablets)

As with many single-master distributed storage systems, client data does not move through the master: clients communicate directly with tablet servers for reads and writes.

hierarchy (B+-tree)

Chubby file -> Root tablet -> other METADATA tablets -> UserTables

METADATA: many other things stored in it

Master: schedule & manage

Each tablet is assigned to one tablet server at a time. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers.

The essential point for distributed database: lock

The Bigtable is only a series of ops, real data is stored in GFS.(SSTable)

Tablet Representation

memtable: the recently committed updates are stored in memory in a sorted buffer

reconstruct: redo points in commit logs


As write operations execute, the size of the memtable increases. When the memtable size reaches a threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable and written to GFS.

minor(memtable) -> major(SSTable) compaction


locality group

Clients can group multiple column families together into a locality group. A separate SSTable is generated for each locality group in each tablet.

This section describes portions of the implementation in more detail in order to highlight these refinements.

in-memory locality groups are loaded lazily

storage: compression

read performance: caching

Bloom filters


Speeding up tablet recovery

Exploiting immutability

Performance Evaluation


  1. large distributed systems are vulnerable to many types of failures
  2. it is important to delay adding new features until it is clear how the new features will be used
  3. the importance of proper system-level monitoring
  4. the value of simple designs

From: https://www.cnblogs.com/kazusarua/p/17768224.html


  • google gtest框架入门使用案例
  • 为什么Google在JSON响应中添加了`while(1);`?
  • Google – Cloud Translation API
    前言通常网站内容翻译,我们都不推荐使用GoogleTranslate。但网站中一些不那么重要的内容确实可以用GoogleTranslate。比如CustomerReviews。这篇是续 GoogleMapsEmbedAPI&JavaScriptAPIGoogle–ReviewsYouTubeDataAPI又一篇关于GoogleCloudAPI的教程。......
  • 使用GoogleTest框架进行cpp代码的基本单元测试
  • 如何使用 Google Analytics 白嫖做应用埋点
    GoogleAnalytics很多时候用于做网站的数据分析,直接在网站中嵌入代码就可以。如果是Chrome插件或者其它应用,可以使用MeasurementProtocolAPI来上报埋点。API官方文档:MeasurementProtocol(GoogleAnalytics(分析)4) | 适用于GoogleAnalytics(分析)4的MeasurementProt......
  • Google Guava 库用法整理
  • Unity 通信方案 - 使用 Google Protobuf 序列化数据
    1.下载和编译1.1下载ProtoBuf源文件从github下载最新的protoBuf库,如下图所示 Releases·protocolbuffers/protobuf(github.com)1.2编译dll和导入解压后打开/scharp/src中的sln工程文件 选择Release,Google.Protobuf,之后在生成中生成文件在......
  • 工具 | 极其方便的谷歌翻译软件 Myna for Google Translate for Mac | Mac
  • VMWare 安装英文版 Windows XP 后遇到中文乱码问题的解决方法
  • Win12不会取代Win11!真正目标是Google