首页 > 其他分享 >Debug: tf-distribute-strategy-worker: json.decoder.JSONDecodeError: Expecting property name enclosed

Debug: tf-distribute-strategy-worker: json.decoder.JSONDecodeError: Expecting property name enclosed

时间:2024-02-14 11:55:05浏览次数:51  
标签:dist name double strat worker quotes tensorflow line example

[ERROR: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 182]

# in file pipeline.yaml
        - name: TF_CONFIG
          value: "{
  \"cluster\": {
    \"worker\": [\"dist-strat-example-worker-0:5000\",\"dist-strat-example-worker-1:5000\"], 
    \"ps\": [\"dist-strat-example-ps-0:5000\"], 
### '],}', the middle ',' should not exist
    \"chief\": [\"dist-strat-example-chief:5000\"],},    
  \"task\": {
    \"type\": \"worker\",
    \"index\": \"0\"
  }
}"
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ kubectl logs dist-strat-example-worker-0-7mqqg 
2024-02-13 16:32:23.777522: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/tf_std_server.py", line 67, in <module>
    main()
  File "/tf_std_server.py", line 41, in main
    if cluster_resolver.task_type in ("worker", "ps"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/distribute/cluster_resolver/tfconfig_cluster_resolver.py", line 104, in task_type
    task_info = _get_value_in_tfconfig(_TASK_KEY, {})
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/distribute/cluster_resolver/tfconfig_cluster_resolver.py", line 43, in _get_value_in_tfconfig
    tf_config = _load_tf_config()
                ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/distribute/cluster_resolver/tfconfig_cluster_resolver.py", line 39, in _load_tf_config
    return json.loads(os.environ.get(_TF_CONFIG_ENV, '{}'))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 182 (char 181)
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ 

(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ kubectl describe pod dist-strat-example-worker-0-7mqqg
Name:             dist-strat-example-worker-0-7mqqg
Namespace:        default
Priority:         0
Service Account:  default
Node:             maye-inspiron-5547/192.168.0.104
Start Time:       Wed, 14 Feb 2024 00:32:12 +0800
Labels:           job=worker
                  name=dist-strat-example
                  task=0
Annotations:      <none>
Status:           Running
IP:               10.244.0.179
IPs:
  IP:           10.244.0.179
Controlled By:  ReplicationController/dist-strat-example-worker-0
Containers:
  tensorflow:
    Container ID:  containerd://613170ce2079886726f3984679a45f8387b0bd9d49b3fe7b97c48b08db905655
    Image:         tf_std_server:v1
    Image ID:      sha256:117ff425f04f86b62e85a1a7ca654d0c36e9c8ac3bcc78f413984e5cbddb8421
    Port:          5000/TCP
    Host Port:     0/TCP
    Command:
      /usr/bin/python
      /tf_std_server.py
      
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 14 Feb 2024 00:33:08 +0800
      Finished:     Wed, 14 Feb 2024 00:33:11 +0800
    Ready:          False
    Restart Count:  3
    Environment:
      TF_CONFIG:  { "cluster": { "worker": ["dist-strat-example-worker-0:5000","dist-strat-example-worker-1:5000"], "ps": ["dist-strat-example-ps-0:5000"], "chief": ["dist-strat-example-chief:5000"],}, "task": { "type": "worker", "index": "0" } }
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-khlnz (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-khlnz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  86s                default-scheduler  Successfully assigned default/dist-strat-example-worker-0-7mqqg to maye-inspiron-5547
  Normal   Pulled     30s (x4 over 85s)  kubelet            Container image "tf_std_server:v1" already present on machine
  Normal   Created    30s (x4 over 83s)  kubelet            Created container tensorflow
  Normal   Started    30s (x4 over 83s)  kubelet            Started container tensorflow
  Warning  BackOff    0s (x6 over 71s)   kubelet            Back-off restarting failed container tensorflow in pod dist-strat-example-worker-0-7mqqg_default(5a1300ff-81b5-44bf-9232-f5d335b9e6fc)
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ 

[SOLUTION]

This error is due to that in file "pipeline.yaml", there is "," in "],}", there should be no "," before "}".

标签:dist,name,double,strat,worker,quotes,tensorflow,line,example
From: https://www.cnblogs.com/zhenxia-jiuyou/p/18015112

相关文章

  • pip 安装包时提示 "WARNING: Skipping xxx due to invalid metadata entry 'name'"
    我最近在使用pip安装包的时候经常遇到如下警告:WARNING:Skipping/opt/homebrew/lib/python3.11/site-packages/numpy-1.26.3.dist-infoduetoinvalidmetadataentry'name'WARNING:Skipping/opt/homebrew/lib/python3.11/site-packages/protobuf-4.25.2-py3.11.egg-info......
  • 《Learning from Context or Names?An Empirical Study on Neural Relation Extractio
    代码原文地址预备知识:1.什么是对比学习?对比学习是一种机器学习范例,将未标记的数据点相互并列,以教导模型哪些点相似,哪些点不同。也就是说,顾名思义,样本相互对比,属于同一分布的样本在嵌入空间中被推向彼此。相比之下,属于不同分布的那些则相互拉扯。摘要神经模型在关系抽取(RE......
  • Linux Namespace
    LinuxNamespace是Linux内核提供的一种机制,用于实现进程之间的隔离。通过使用Namespace,可以将一组进程和资源限制在一个隔离的环境中,使它们看起来像在独立的系统上运行一样。PIDNamespace(进程隔离):PIDNamespace为进程提供了独立的进程ID空间,使得每个Namespace内的进程......
  • kubernetes---namespace(命名空间)
    1.查看namespace[root@k8s-master1~]#kubectlgetnamespaces#namespaces可以简写namespace或nsNAMESTATUSAGEdefaultActive130m #所有未指定Namespace的对象都会被默认分配在default命名空间kube-node-leaseActive130m kube-publ......
  • Oracle 19c enterprise manager express username password
    *[Oracle19centerprisemanagerexpressusernamepassword-Search](https://cn.bing.com/search?q=Oracle+19c+enterprise+manager+express+username+password&qs=n&form=QBRE&sp=-1&lq=0&pq=oracle+19c+enterprise+manager+express+username+passw......
  • org.springframework.beans.factory.NoSuchBeanDefinitionException: No bean named '
    开发遇到一个问题:org.springframework.beans.factory.NoSuchBeanDefinitionException:Nobeannamed'ckhSynCardNumToMbhkJob'available这个报错可能是因为:1.spring的xml配置文件Bean中的id和getBean的id不一致2.是否是忘记加注解了,3.启动类包扫描路径是否正确经过测试发......
  • DRIVERQUERY [/S system [/U username [/P [password]]]]               [/FO
    DRIVERQUERY[/Ssystem[/Uusername[/P[password]]]]       [/FOformat][/NH][/SI][/V]描述:  允许管理员显示已安装设备驱动程序  的列表。参数列表:   /S  system     指定要连接到的远程系统。   /U  [domai......
  • 【K8S】namespace 一直处在terminating状态
    1、想要去删除k8s中的一个指定命名空间,刚开始使用命令kubectldeletens命名空间的名字#或者使用kubectldeletens命名空间的名字--force--grace-period=0使用以上两种命令均无法成功删除命名空间,只会使命名空间的状态为Terminating状态2、使用以下方法成功删除1)使......
  • 在K8s中,容器内如何获取pod和namespace名?
    在Kubernetes(K8s)中,容器可以通过DownwardAPI来获取Pod和Namespace的信息。以下是两种方法来实现这一点:通过环境变量获取获取Pod名称:在Pod的配置中,可以设置一个环境变量,将Pod的名字注入到容器内:apiVersion:v1kind:Podmetadata:name:my-podspec:containers:......
  • OpenWrt之自定义Hostname
    OpenWrt之自定义Hostname找到对应的代码,在feeds/./luci/modules/luci-lua-runtime/luasrc/sys.lua中,有cur:foreach("dhcp","dnsmasq",function(s)ifs.leasefileandfs.access(s.leasefile)then......