首页 > 其他分享 >HDFS Snapshots

HDFS Snapshots

时间:2023-06-01 10:33:22浏览次数:49  
标签:HDFS snapshottable Snapshots snapshot file directory path foo


HDFS Snapshots


Overview

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.

The implementation of HDFS Snapshots is efficient:

  • Snapshot creation is instantaneous: the cost is O(1) excluding the inode lookup time.
  • Additional memory is used only when modifications are made relative to a snapshot: memory usage is O(M), where M is the number of modified files/directories.
  • Blocks in datanodes are not copied: the snapshot files record the block list and the file size. There is no data copying.
  • Snapshots do not adversely affect regular HDFS operations: modifications are recorded in reverse chronological order so that the current data can be accessed directly. The snapshot data is computed by subtracting the modifications from the current data.


Snapshottable Directories

Snapshots can be taken on any directory once the directory has been set as snapshottable. A snapshottable directory is able to accommodate 65,536 simultaneous snapshots. There is no limit on the number of snapshottable directories. Administrators may set any directory to be snapshottable. If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.

Nested snapshottable directories are currently not allowed. In other words, a directory cannot be set to snapshottable if one of its ancestors/descendants is a snapshottable directory.



Snapshot Paths

For a snapshottable directory, the path component ".snapshot" is used for accessing its snapshots. Suppose /foo is a snapshottable directory, /foo/bar is a file/directory in /foo, and /foo has a snapshot s0. Then, the path



/foo/.snapshot/s0/bar

refers to the snapshot copy of 

/foo/bar. The usual API and CLI can work with the ".snapshot" paths. The following are some examples.

  • Listing all the snapshots under a snapshottable directory:
    hdfs dfs -ls /foo/.snapshot
  • Listing the files in snapshot s0:
    hdfs dfs -ls /foo/.snapshot/s0
  • Copying a file from snapshot s0:
    hdfs dfs -cp -ptopax /foo/.snapshot/s0/bar /tmp
    Note that this example uses the preserve option to preserve timestamps, ownership, permission, ACLs and XAttrs.



Upgrading to a version of HDFS with snapshots

The HDFS snapshot feature introduces a new reserved path name used to interact with snapshots: .snapshot. When upgrading from an older version of HDFS, existing paths named .snapshot need to first be renamed or deleted to avoid conflicting with the reserved path. See the upgrade section in the HDFS user guide for more information.



Snapshot Operations



Administrator Operations

The operations described in this section require superuser privilege.



Allow Snapshots

Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable.

  • Command:
    hdfs dfsadmin -allowSnapshot <path>
  • Arguments:

path

The path of the snapshottable directory.

See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin.



Disallow Snapshots

Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots.

  • Command:
    hdfs dfsadmin -disallowSnapshot <path>
  • Arguments:

path

The path of the snapshottable directory.

See also the corresponding Java API void disallowSnapshot(Path path) in HdfsAdmin.



User Operations

The section describes user operations. Note that HDFS superuser can perform all the operations without satisfying the permission requirement in the individual operations.



Create Snapshots

Create a snapshot of a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

  • Command:
    hdfs dfs -createSnapshot <path> [<snapshotName>]
  • Arguments:

path

The path of the snapshottable directory.

snapshotName

"'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033".

See also the corresponding Java API Path createSnapshot(Path path) and Path createSnapshot(Path path, String snapshotName) in FileSystem. The snapshot path is returned in these methods.



Delete Snapshots

Delete a snapshot of from a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

  • Command:
    hdfs dfs -deleteSnapshot <path> <snapshotName>
  • Arguments:

path

The path of the snapshottable directory.

snapshotName

The snapshot name.

See also the corresponding Java API void deleteSnapshot(Path path, String snapshotName) in FileSystem.



Rename Snapshots

Rename a snapshot. This operation requires owner privilege of the snapshottable directory.

  • Command:
    hdfs dfs -renameSnapshot <path> <oldName> <newName>
  • Arguments:

path

The path of the snapshottable directory.

oldName

The old snapshot name.

newName

The new snapshot name.

See also the corresponding Java API void renameSnapshot(Path path, String oldName, String newName) in FileSystem.



Get Snapshottable Directory Listing

Get all the snapshottable directories where the current user has permission to take snapshtos.

  • Command:
    hdfs lsSnapshottableDir
  • Arguments: none

See also the corresponding Java API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing() in DistributedFileSystem.



Get Snapshots Difference Report

Get the differences between two snapshots. This operation requires read access privilege for all files/directories in both snapshots.

  • Command:
    hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
  • Arguments:

path

The path of the snapshottable directory.

fromSnapshot

The name of the starting snapshot.

toSnapshot

The name of the ending snapshot.

  • Results:

+

The file/directory has been created.

-

The file/directory has been deleted.

M

The file/directory has been modified.

R

The file/directory has been renamed.

RENAME entry indicates a file/directory has been renamed but is still under the same snapshottable directory. A file/directory is reported as deleted if it was renamed to outside of the snapshottble directory. A file/directory renamed from outside of the snapshottble directory is reported as newly created.

The snapshot difference report does not guarantee the same operation sequence. For example, if we rename the directory "/foo" to "/foo2", and then append new data to the file "/foo2/bar", the difference report will be:



R. /foo -> /foo2 M. /foo/bar

I.e., the changes on the files/directories under a renamed directory is reported using the original path before the rename (

"/foo/bar" in the above example).


See also the corresponding Java API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot) in DistributedFileSystem.

标签:HDFS,snapshottable,Snapshots,snapshot,file,directory,path,foo
From: https://blog.51cto.com/u_11860992/6392722

相关文章

  • 【博学谷学习记录】超强总结,用心分享 | HDFS
    【博学谷IT技术支持】HDFSHDFS又称分布式系统,采用了主从(Master/Slave)结构模型,一个HDFS集群是由一个NameNode和若干个DataNode组成的。其中NameNode作为主服务器,管理文件系统的命名空间和客户端对文件的访问操作;集群中的DataNode管理存储的数据。特点海量数据存储:可横向扩展,......
  • 【博学谷学习记录】超强总结,用心分享 | HDFS读写流程
    【博学谷IT技术支持】HDFS写流程上图是HDFS的写流程图主要步骤如下client向服务器发起上传请求(RPC)NameNode接受到请求之后会进行权限检查(目录是否存在权限,目录是否存在)NameNode会给client反馈是否可以上传标记Client会将要上传的文件安装设置的Block大小进行切片Clie......
  • Hudi表创建时HDFS上的变化
    SparkSQL建Hudi表语句:CREATETABLEt71(dsBIGINT,utSTRING,pkBIGINT,f0BIGINT,f1BIGINT,f2BIGINT,f3BIGINT,f4BIGINT)USINGhudiPARTITIONEDBY(ds)TBLPROPERTIES(--这里也可使用options(https://hudi.apache.org/......
  • 使用python操作hdfs,并grep想要的数据
    代码如下:importsubprocessfordayinrange(24,30):forhinrange(0,24):filename="tls-metadata-2018-10-%02d-%02d.txt"%(day,h)cmd="hdfsdfs-text/data/2018/10/%02d/%02d/*.snappy"%(day,h)print(c......
  • HDFS 文件格式——SequenceFile RCFile
    HDFS块内行存储的例子HDFS块内列存储的例子HDFS块内RCFile方式存储的例子......
  • hdfs文件上传打包及bug汇总
    1、错误:找不到或无法加载主类删除META-INFO下的.DSA和.SF文件即可来源csdn文章2、ERRORorg.apache.hadoop.fs.UnsupportedFileSystemException:NoFileSystemforscheme"file"ConfigurationlocalConf=newConfiguration();//ERRORorg.apache.h......
  • hdfs开启回收站(废纸篓)
    1、背景我们知道,在mac系统上删除文件,一般情况下是可以进入废纸篓里的,如果此时我们误删除了,还可以从废纸篓中恢复过来。那么在hdfs中是否存在类似mac上的废纸篓这个功能呢?答案是存在的。2、开启hdfstrash功能当我们启用Trash功能后,从HDFS中删除某些内容时,文件或目录不会......
  • hdfs开启回收站(废纸篓)
    1、背景我们知道,在mac系统上删除文件,一般情况下是可以进入废纸篓里的,如果此时我们误删除了,还可以从废纸篓中恢复过来。那么在hdfs中是否存在类似mac上的废纸篓这个功能呢?答案是存在的。2、开启hdfstrash功能当我们启用Trash功能后,从HDFS中删除某些内容时,文件或目录不会......
  • HDFS的block为什么是128M?增大或减小有什么影响?
    1、首先先来了解几个概念寻址时间:HDFS中找到目标文件block块所花费的时间。原理:文件块越大,寻址时间越短,但磁盘传输时间越长;文件块越小,寻址时间越长,但磁盘传输时间越短。2、为什么block不能设置过大,也不能设置过小如果块设置过大,如果块设置的太大,从磁盘传输数据的时间会明显大于定位......
  • HDFS文件因Hadoop版本原因导致append操作失败的问题
    问题重现:2023.05.24练习B站尚硅谷Hadoop3里的HDFS的Shell操作(append)[[email protected]]$hadoopfs-appendToFileliubei.txt/sa点击查看代码[[email protected]]$hadoopfs-appendToFileliubei.txt/sa2023-05-2420:30:37,303WARNhdfs.......