首页 > 其他分享 >ViewFs Guide

ViewFs Guide

时间:2023-06-01 10:33:40浏览次数:45  
标签:mount port ViewFs cluster file namenode foo Guide


ViewFs Guide

Introduction

The View File System (ViewFs) provides a way to manage multiple Hadoop file system namespaces (or namespace volumes). It is particularly useful for clusters having multiple namenodes, and hence multiple namespaces, in HDFS Federation. ViewFs is analogous to client side mount tables in some Unix/Linux systems. ViewFs can be used to create personalized namespace views and also per-cluster common views.

This guide is presented in the context of Hadoop systems that have several clusters, each cluster may be federated into multiple namespaces. It also describes how to use ViewFs in federated HDFS to provide a per-cluster global namespace so that applications can operate in a way similar to the pre-federation world.


The Old World (Prior to Federation)



Single Namenode Clusters

In the old world prior to HDFS Federation, a cluster has a single namenode which provides a single file system namespace for that cluster. Suppose there are multiple clusters. The file system namespaces of each cluster are completely independent and disjoint. Furthermore, physical storage is NOT shared across clusters (i.e. the Datanodes are not shared across clusters.)

The core-site.xml



<property> <name>fs.default.name</name> <value>hdfs://namenodeOfClusterX:port</value> </property>



Such a configuration property allows one to use slash-relative names to resolve paths relative to the cluster namenode. For example, the path /foo/bar is referring tohdfs://namenodeOfClusterX:port/foo/bar

This configuration property is set on each gateway on the clusters and also on key services of that cluster such the JobTracker and Oozie.



Pathnames Usage Patterns

Hence on Cluster X where the core-site.xml

  1. /foo/bar
  • This is equivalent to hdfs://namenodeOfClusterX:port/foo/bar
  1. hdfs://namenodeOfClusterX:port/foo/bar
  • While this is a valid pathname, one is better using /foo/bar
  1. hdfs://namenodeOfClusterY:port/foo/bar
  • It is an URI for referring a pathname on another cluster such as Cluster Y. In particular, the command for copying files from cluster Y to Cluster Z looks like:

distcp hdfs://namenodeClusterY:port/pathSrc hdfs://namenodeClusterZ:port/pathDest

  1. webhdfs://namenodeClusterX:http_port/foo/bar and hftp://namenodeClusterX:http_port/foo/bar
  • These are file system URIs respectively for accessing files via the WebHDFS file system and the HFTP file system. Note that WebHDFS and HFTP use the HTTP port of the namenode but not the RPC port.
  1. http://namenodeClusterX:http_port/webhdfs/v1/foo/bar and http://proxyClusterX:http_port/foo/bar
  • These are HTTP URLs respectively for accessing files via WebHDFS REST API and HDFS proxy.



Pathname Usage Best Practices

When one is within a cluster, it is recommended to use the pathname of type (1) above instead of a fully qualified URI like (2). Fully qualified URIs are similar to addresses and do not allow the application to move along with its data.


New World – Federation and ViewFs



How The Clusters Look

Suppose there are multiple clusters. Each cluster has one or more namenodes. Each namenode has its own namespace. A namenode belongs to one and only one cluster. The namenodes in the same cluster share the physical storage of that cluster. The namespaces across clusters are independent as before.

Operations decide what is stored on each namenode within a cluster based on the storage needs. For example, they may put all the user data (/user/<username>) in one namenode, all the feed-data (/data) in another namenode, all the projects (/projects) in yet another namenode, etc.



A Global Namespace Per Cluster Using ViewFs

In order to provide transparency with the old world, the ViewFs file system (i.e. client-side mount table) is used to create each cluster an independent cluster namespace view, which is similar to the namespace in the old world. The client-side mount tables like the Unix mount tables and they mount the new namespace volumes using the old naming convention. The following figure shows a mount table mounting four namespace volumes /user, /data, /projects, and /tmp:

ViewFs Guide_Hadoop

ViewFs implements the Hadoop file system interface just like HDFS and the local file system. It is a trivial file system in the sense that it only allows linking to other file systems. Because ViewFs implements the Hadoop file system interface, it works transparently Hadoop tools. For example, all the shell commands work with ViewFs as with HDFS and local file system.

The mount points of a mount table are specified in the standard Hadoop configuration files. In the configuration of each cluster, the default file system is set to the mount table for that cluster as shown below (compare it with the configuration in Single Namenode Clusters).



<property> <name>fs.default.name</name> <value>viewfs://clusterX</value> </property>



The authority following the viewfs://



Pathname Usage Patterns

Hence on Cluster X, where the core-site.xml

  1. /foo/bar
  • This is equivalent to viewfs://clusterX/foo/bar. If such pathname is used in the old non-federated world, then the transition to federation world is transparent.
  1. viewfs://clusterX/foo/bar
  • While this a valid pathname, one is better using /foo/bar
  1. viewfs://clusterY/foo/bar
  • It is an URI for referring a pathname on another cluster such as Cluster Y. In particular, the command for copying files from cluster Y to Cluster Z looks like:

distcp viewfs://clusterY:/pathSrc viewfs://clusterZ/pathDest

  1. viewfs://clusterX-webhdfs/foo/bar and viewfs://clusterX-hftp/foo/bar
  • These are URIs respectively for accessing files via the WebHDFS file system and the HFTP file system.
  1. http://namenodeClusterX:http_port/webhdfs/v1/foo/bar and http://proxyClusterX:http_port/foo/bar
  • These are HTTP URLs respectively for accessing files via WebHDFS REST API and HDFS proxy. Note that they are the same as before.



Pathname Usage Best Practices

When one is within a cluster, it is recommended to use the pathname of type (1) above instead of a fully qualified URI like (2). Futher, applications should not use the knowledge of the mount points and use a path like hdfs://namenodeContainingUserDirs:port/joe/foo/bar to refer to a file in a particular namenode. One should use /user/joe/foo/barinstead.



Renaming Pathnames Across Namespaces

Recall that one cannot rename files or directories across namenodes or clusters in the old world. The same is true in the new world but with an additional twist. For example, in the old world one can perform the commend below.



rename /user/joe/myStuff /data/foo/bar



This will NOT work in the new world if /user and /data



FAQ

  1. As I move from non-federated world to the federated world, I will have to keep track of namenodes for different volumes; how do I do that?
    No, you won’t. See the examples above – you are either using a relative name and taking advantage of the default file system, or changing your path fromhdfs://namenodeCLusterX/foo/bar to viewfs://clusterX/foo/bar.
  2. What happens of Operations move some files from one namenode to another namenode within a cluster?
    Operations may move files from one namenode to another in order to deal with storage capacity issues. They will do this in a way to avoid applications from breaking. Let’s take some examples.
  • Example 1: /user and /data were on one namenode and later they need to be on separate namenodes to deal with capacity issues. Indeed, operations would have created separate mount points for /user and /data. Prior to the change the mounts for /user and /data would have pointed to the same namenode, saynamenodeContainingUserAndData. Operations will update the mount tables so that the mount points are changed to namenodeContaingUser and namenodeContainingData, respectively.
  • Example 2: All projects were fitted on one namenode and but later they need two or more namenodes. ViewFs allows mounts like /project/foo and /project/bar. This allows mount tables to be updated to point to the corresponding namenode.
  1. Is the mount table in each core-site.xml or in a separate file of its own?
    The plan is to keep the mount tables in separate files and have the core-site.xml xincluding it. While one can keep these files on each machine locally, it is better to use HTTP to access it from a central location.
  2. Should the configuration have the mount table definitions for only one cluster or all clusters?
    The configuration should have the mount definitions for all clusters since one needs to have access to data in other clusters such as with distcp.
  3. When is the mount table actually read given that Operations may change a mount table over time?
    The mount table is read when the job is submitted to the cluster. The XInclude in core-site.xml
  4. Will JobTracker (or Yarn’s Resource Manager) itself use the ViewFs?
    No, it does not need to. Neither does the NodeManager.
  5. Does ViewFs allow only mounts at the top level?
    No; it is more general. For example, one can mount /user/joe and /user/jane. In this case, an internal read-only directory is created for /user in the mount table. All operations on /user are valid except that /user
  6. An application works across the clusters and needs to persistently store file paths. Which paths should it store?
    You should store viewfs://cluster/path
  7. What about delegation tokens?
    Delegation tokens for the cluster to which you are submitting the job (including all mounted volumes for that cluster’s mount table), and for input and output paths to your map-reduce job (including all volumes mounted via mount tables for the specified input and output paths) are all handled automatically. In addition, there is a way to add additional delegation tokens to the base cluster configuration for special circumstances.


Appendix: A Mount Table Configuration Example

Generally, users do not have to define mount tables or the core-site.xml to use the mount table. This is done by operations and the correct configuration is set on the right gateway machines as is done for core-site.xml

The mount tables can be described in core-site.xml but it is better to use indirection in core-site.xml to reference a separate configuration file, say mountTable.xml. Add the following configuration element to core-site.xml for referencing mountTable.xml:



<configuration xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="mountTable.xml" /> </configuration>



In the file mountTable.xml, there is a definition of the mount table “ClusterX” for the hypothetical cluster that is a federation of the three namespace volumes managed by the three namenodes

  1. nn1-clusterx.example.com:8020,
  2. nn2-clusterx.example.com:8020, and
  3. nn3-clusterx.example.com:8020.

Here /home and /tmp are in the namespace managed by namenode nn1-clusterx.example.com:8020, and projects /foo and /bar are hosted on the other namenodes of the federated cluster. The home directory base path is set to /home so that each user can access its home directory using the getHomeDirectory() method defined inFileSystem/FileContext.



<configuration>
  <property>
    <name>fs.viewfs.mounttable.ClusterX.homedir</name>
    <value>/home</value>
  </property>
  <property>
    <name>fs.viewfs.mounttable.ClusterX.link./home</name>
    <value>hdfs://nn1-clusterx.example.com:8020/home</value>
  </property>
  <property>
    <name>fs.viewfs.mounttable.ClusterX.link./tmp</name>
    <value>hdfs://nn1-clusterx.example.com:8020/tmp</value>
  </property>
  <property>
    <name>fs.viewfs.mounttable.ClusterX.link./projects/foo</name>
    <value>hdfs://nn2-clusterx.example.com:8020/projects/foo</value>
  </property>
  <property>
    <name>fs.viewfs.mounttable.ClusterX.link./projects/bar</name>
    <value>hdfs://nn3-clusterx.example.com:8020/projects/bar</value>
  </property>
</configuration>

标签:mount,port,ViewFs,cluster,file,namenode,foo,Guide
From: https://blog.51cto.com/u_11860992/6392721

相关文章

  • CLIP-S^4:Language-Guided Self-Supervised Semantic Segmentation论文阅读笔记
    摘要作者提出了CLIP-S4,借助自监督像素表示学习和V-L模型实现各种语义分割任务,不需要使用任何像素级别标注以及未知类的信息。作者首先通过对图像的不同增强视角进行像素-分割对比学习来学习像素嵌入。之后,为进一步改善像素嵌入并实现基于自然语言的语义分割,作者设计了由V-L模型指......
  • Heuristic-Guided Reinforcement Learning
    发表时间:2021(NeurIPS2021)文章要点:这篇文章提出了一个Heuristic-GuidedReinforcementLearning(HuRL)的框架,用domainknowledge或者offlinedata构建heuristic,将问题变成一个shorter-horizon的子问题,从而更容易解决。具体的,就是将原始的MDP变换成一个新的reward和gamma的M......
  • rempe-2023-Trace and Pace: Controllable Pedestrian Animation via Guided Trajecto
    #TraceandPace:ControllablePedestrianAnimationviaGuidedTrajectoryDiffusion#paper1.paper-info1.1MetadataAuthor::[[DavisRempe]],[[ZhengyiLuo]],[[XueBinPeng]],[[YeYuan]],[[KrisKitani]],[[KarstenKreis]],[[SanjaFidler]],[[OrLi......
  • Sitecore10 Demo演示环境Azure一键部署(Step By Step Guide to installing Sitecore10
    本文演示SitecoreXPSingle(XP0)在Azure上的一键部署,即“30分钟生成Sitecore演示环境”的一环。关于XP(即SitecoreExperiencePlatform)roles的相关介绍移步XPSingle配置主要用来开发和测试:FourSitecoreroles:ContentDelivery,ContentManagement,Processing,andRepo......
  • 【ZeroMQ】zguide 第一章 部分翻译
    为了更好的阅读体验,请点击这里本文大部分内容翻译自Chapter1-Basics,原因是之前翻译的版本太老了,不得不亲自披挂上阵拿机器翻译一下。只截取了部分自己可能用得到的,所以如果有看不太懂的地方,去翻一下原网页吧。QWQ附赠libzmq的api接口函数说明一份。一、基础函数int......
  • Guide to Regen on VW Passat TDI with Launch x-431 Pro5
    LaunchX-431PRO5openstheeraofinnovativedualdiagnosticmodes(localdiagnosisandSmartLinkremotediagnosis).ItinheritsthepowerfuldiagnosticstrengthofPROseriesandcomeswithmultipleadvantages,suchaswidevehiclemodelcoverage,numer......
  • Spring Quickstart Guide同步构建测试
    进行了前置的配置后,打开IDEA,进行环境构建,第一次运行结果 把JDK更换为17版本后同步成功   输出结果  浏览器运行结果 ......
  • 迁移学习《Cluster-Guided Semi-Supervised Domain Adaptation for Imbalanced Medica
    论文信息论文标题:Cluster-GuidedSemi-SupervisedDomainAdaptationforImbalancedMedicalImageClassification论文作者:S.Harada,RyomaBise,KengoAraki论文来源:ArXiv2March2023论文地址:download 论文代码:download视屏讲解:click1摘要一种半监督域自适应方法,......
  • Python MySQL UsingGuide
    1.MySQLInstall2.MySQLBasicalTestDemonstrationimportpymysql.cursorsimporttime#ConnecttothedatabaseT1=time.time()connection=pymysql.connect(host='localhost',user='root',......
  • Learning Blender: A Hands-On Guide to Creating 3D Animation(2nd Edition)
    参考1:https://www.doc88.com/p-9975664843996.html(书)参考2:https://www.bilibili.com/video/BV1wW411i7nY(视频)......