首页 > 其他分享 >proto 3支持的基础类型

proto 3支持的基础类型

时间:2024-04-23 09:04:24浏览次数:32  
标签:proto 支持 field value 类型 message type your

This guide describes how to use the protocol buffer language to structure your protocol buffer data, including .proto file syntax and how to generate data access classes from your .proto files. It covers the proto3 version of the protocol buffers language: for information on the proto2 syntax, see the Proto2 Language Guide.

This is a reference guide – for a step by step example that uses many of the features described in this document, see the tutorial for your chosen language.

Defining A Message Type

First let’s look at a very simple example. Let’s say you want to define a search request message format, where each search request has a query string, the particular page of results you are interested in, and a number of results per page. Here’s the .proto file you use to define the message type.

syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}
  • The first line of the file specifies that you’re using proto3 syntax: if you don’t do this the protocol buffer compiler will assume you are using proto2. This must be the first non-empty, non-comment line of the file.
  • The SearchRequest message definition specifies three fields (name/value pairs), one for each piece of data that you want to include in this type of message. Each field has a name and a type.

Specifying Field Types

In the earlier example, all the fields are scalar types: two integers (page_number and results_per_page) and a string (query). You can also specify enumerations and composite types like other message types for your field.

Assigning Field Numbers

You must give each field in your message definition a number between 1 and 536,870,911 with the following restrictions:

  • The given number must be unique among all fields for that message.
  • Field numbers 19,000 to 19,999 are reserved for the Protocol Buffers implementation. The protocol buffer compiler will complain if you use one of these reserved field numbers in your message.
  • You cannot use any previously reserved field numbers or any field numbers that have been allocated to extensions.

This number cannot be changed once your message type is in use because it identifies the field in the message wire format. “Changing” a field number is equivalent to deleting that field and creating a new field with the same type but a new number. See Deleting Fields for how to do this properly.

Field numbers should never be reused. Never take a field number out of the reserved list for reuse with a new field definition. See Consequences of Reusing Field Numbers.

You should use the field numbers 1 through 15 for the most-frequently-set fields. Lower field number values take less space in the wire format. For example, field numbers in the range 1 through 15 take one byte to encode. Field numbers in the range 16 through 2047 take two bytes. You can find out more about this in Protocol Buffer Encoding.

Consequences of Reusing Field Numbers

Reusing a field number makes decoding wire-format messages ambiguous.

The protobuf wire format is lean and doesn’t provide a way to detect fields encoded using one definition and decoded using another.

Encoding a field using one definition and then decoding that same field with a different definition can lead to:

  • Developer time lost to debugging
  • A parse/merge error (best case scenario)
  • Leaked PII/SPII
  • Data corruption

Common causes of field number reuse:

  • renumbering fields (sometimes done to achieve a more aesthetically pleasing number order for fields). Renumbering effectively deletes and re-adds all the fields involved in the renumbering, resulting in incompatible wire-format changes.
  • deleting a field and not reserving the number to prevent future reuse.

The max field is 29 bits instead of the more-typical 32 bits because three lower bits are used for the wire format. For more on this, see the Encoding topic.

 

Specifying Field Labels

Message fields can be one of the following:

  • optional: An optional field is in one of two possible states:

    • the field is set, and contains a value that was explicitly set or parsed from the wire. It will be serialized to the wire.
    • the field is unset, and will return the default value. It will not be serialized to the wire.

    You can check to see if the value was explicitly set.

  • repeated: this field type can be repeated zero or more times in a well-formed message. The order of the repeated values will be preserved.

  • map: this is a paired key/value field type. See Maps for more on this field type.

  • If no explicit field label is applied, the default field label, called “implicit field presence,” is assumed. (You cannot explicitly set a field to this state.) A well-formed message can have zero or one of this field (but not more than one). You also cannot determine whether a field of this type was parsed from the wire. An implicit presence field will be serialized to the wire unless it is the default value. For more on this subject, see Field Presence.

In proto3, repeated fields of scalar numeric types use packed encoding by default. You can find out more about packed encoding in Protocol Buffer Encoding.

Well-formed Messages

The term “well-formed,” when applied to protobuf messages, refers to the bytes serialized/deserialized. The protoc parser validates that a given proto definition file is parseable.

In the case of optional fields that have more than one value, the protoc parser will accept the input, but only uses the last field. So, the “bytes” may not be “well-formed” but the resulting message would have only one and would be “well-formed” (but would not roundtrip the same).

Adding More Message Types

Multiple message types can be defined in a single .proto file. This is useful if you are defining multiple related messages – so, for example, if you wanted to define the reply message format that corresponds to your SearchResponse message type, you could add it to the same .proto:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}

message SearchResponse {
 ...
}

Combining Messages leads to bloat While multiple message types (such as message, enum, and service) can be defined in a single .proto file, it can also lead to dependency bloat when large numbers of messages with varying dependencies are defined in a single file. It’s recommended to include as few message types per .proto file as possible.

Adding Comments

To add comments to your .proto files, use C/C++-style // and /* ... */ syntax.

/* SearchRequest represents a search query, with pagination options to
 * indicate which results to include in the response. */

message SearchRequest {
  string query = 1;
  int32 page_number = 2;  // Which page number do we want?
  int32 results_per_page = 3;  // Number of results to return per page.
}

Deleting Fields

Deleting fields can cause serious problems if not done properly.

When you no longer need a field and all references have been deleted from client code, you may delete the field definition from the message. However, you must reserve the deleted field number. If you do not reserve the field number, it is possible for a developer to reuse that number in the future.

You should also reserve the field name to allow JSON and TextFormat encodings of your message to continue to parse.

Reserved Fields

If you update a message type by entirely deleting a field, or commenting it out, future developers can reuse the field number when making their own updates to the type. This can cause severe issues, as described in Consequences of Reusing Field Numbers.

To make sure this doesn’t happen, add your deleted field number to the reserved list. To make sure JSON and TextFormat instances of your message can still be parsed, also add the deleted field name to a reserved list.

The protocol buffer compiler will complain if any future developers try to use these reserved field numbers or names.

message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

Reserved field number ranges are inclusive (9 to 11 is the same as 9, 10, 11). Note that you can’t mix field names and field numbers in the same reserved statement.

What’s Generated from Your .proto?

When you run the protocol buffer compiler on a .proto, the compiler generates the code in your chosen language you’ll need to work with the message types you’ve described in the file, including getting and setting field values, serializing your messages to an output stream, and parsing your messages from an input stream.

  • For C++, the compiler generates a .h and .cc file from each .proto, with a class for each message type described in your file.
  • For Java, the compiler generates a .java file with a class for each message type, as well as a special Builder class for creating message class instances.
  • For Kotlin, in addition to the Java generated code, the compiler generates a .kt file for each message type with an improved Kotlin API. This includes a DSL that simplifies creating message instances, a nullable field accessor, and a copy function.
  • Python is a little different — the Python compiler generates a module with a static descriptor of each message type in your .proto, which is then used with a metaclass to create the necessary Python data access class at runtime.
  • For Go, the compiler generates a .pb.go file with a type for each message type in your file.
  • For Ruby, the compiler generates a .rb file with a Ruby module containing your message types.
  • For Objective-C, the compiler generates a pbobjc.h and pbobjc.m file from each .proto, with a class for each message type described in your file.
  • For C#, the compiler generates a .cs file from each .proto, with a class for each message type described in your file.
  • For PHP, the compiler generates a .php message file for each message type described in your file, and a .php metadata file for each .proto file you compile. The metadata file is used to load the valid message types into the descriptor pool.
  • For Dart, the compiler generates a .pb.dart file with a class for each message type in your file.

You can find out more about using the APIs for each language by following the tutorial for your chosen language. For even more API details, see the relevant API reference.

Scalar Value Types

A scalar message field can have one of the following types – the table shows the type specified in the .proto file, and the corresponding type in the automatically generated class:

.proto TypeNotesC++ TypeJava/Kotlin Type[1]Python Type[3]Go TypeRuby TypeC# TypePHP TypeDart Type
double   double double float float64 Float double float double
float   float float float float32 Float float float double
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 Fixnum or Bignum (as required) int integer int
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long[4] int64 Bignum long integer/string[6] Int64
uint32 Uses variable-length encoding. uint32 int[2] int/long[4] uint32 Fixnum or Bignum (as required) uint integer int
uint64 Uses variable-length encoding. uint64 long[2] int/long[4] uint64 Bignum ulong integer/string[6] Int64
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 Fixnum or Bignum (as required) int integer int
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long[4] int64 Bignum long integer/string[6] Int64
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 228. uint32 int[2] int/long[4] uint32 Fixnum or Bignum (as required) uint integer int
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 256. uint64 long[2] int/long[4] uint64 Bignum ulong integer/string[6] Int64
sfixed32 Always four bytes. int32 int int int32 Fixnum or Bignum (as required) int integer int
sfixed64 Always eight bytes. int64 long int/long[4] int64 Bignum long integer/string[6] Int64
bool   bool boolean bool bool TrueClass/FalseClass bool boolean bool
string A string must always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 232. string String str/unicode[5] string String (UTF-8) string string String
bytes May contain any arbitrary sequence of bytes no longer than 232. string ByteString str (Python 2)
bytes (Python 3)
[]byte String (ASCII-8BIT) ByteString string List

[1] Kotlin uses the corresponding types from Java, even for unsigned types, to ensure compatibility in mixed Java/Kotlin codebases.

[2] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit.

[3] In all cases, setting values to a field will perform type checking to make sure it is valid.

[4] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].

[5] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).

[6] Integer is used on 64-bit machines and string is used on 32-bit machines.

You can find out more about how these types are encoded when you serialize your message in Protocol Buffer Encoding.

Default Values

When a message is parsed, if the encoded message does not contain a particular implicit presence element, accessing the corresponding field in the parsed object returns the default value for that field. These defaults are type-specific:

  • For strings, the default value is the empty string.
  • For bytes, the default value is empty bytes.
  • For bools, the default value is false.
  • For numeric types, the default value is zero.
  • For enums, the default value is the first defined enum value, which must be 0.
  • For message fields, the field is not set. Its exact value is language-dependent. See the generated code guide for details.

The default value for repeated fields is empty (generally an empty list in the appropriate language).

Note that for scalar message fields, once a message is parsed there’s no way of telling whether a field was explicitly set to the default value (for example whether a boolean was set to false) or just not set at all: you should bear this in mind when defining your message types. For example, don’t have a boolean that switches on some behavior when set to false if you don’t want that behavior to also happen by default. Also note that if a scalar message field is set to its default, the value will not be serialized on the wire. If a float or double value is set to +0 it will not be serialized, but -0 is considered distinct and will be serialized.

See the generated code guide for your chosen language for more details about how defaults work in generated code.

Enumerations

When you’re defining a message type, you might want one of its fields to only have one of a predefined list of values. For example, let’s say you want to add a corpus field for each SearchRequest, where the corpus can be UNIVERSALWEBIMAGESLOCALNEWSPRODUCTS or VIDEO. You can do this very simply by adding an enum to your message definition with a constant for each possible value.

In the following example we’ve added an enum called Corpus with all the possible values, and a field of type Corpus:

enum Corpus {
  CORPUS_UNSPECIFIED = 0;
  CORPUS_UNIVERSAL = 1;
  CORPUS_WEB = 2;
  CORPUS_IMAGES = 3;
  CORPUS_LOCAL = 4;
  CORPUS_NEWS = 5;
  CORPUS_PRODUCTS = 6;
  CORPUS_VIDEO = 7;
}

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
  Corpus corpus = 4;
}

As you can see, the Corpus enum’s first constant maps to zero: every enum definition must contain a constant that maps to zero as its first element. This is because:

  • There must be a zero value, so that we can use 0 as a numeric default value.
  • The zero value needs to be the first element, for compatibility with the proto2 semantics where the first enum value is the default unless a different value is explicitly specified.

You can define aliases by assigning the same value to different enum constants. To do this you need to set the allow_alias option to true. Otherwise, the protocol buffer compiler generates a warning message when aliases are found. Though all alias values are valid during deserialization, the first value is always used when serializing.

enum EnumAllowingAlias {
  option allow_alias = true;
  EAA_UNSPECIFIED = 0;
  EAA_STARTED = 1;
  EAA_RUNNING = 1;
  EAA_FINISHED = 2;
}

enum EnumNotAllowingAlias {
  ENAA_UNSPECIFIED = 0;
  ENAA_STARTED = 1;
  // ENAA_RUNNING = 1;  // Uncommenting this line will cause a warning message.
  ENAA_FINISHED = 2;
}

Enumerator constants must be in the range of a 32-bit integer. Since enum values use varint encoding on the wire, negative values are inefficient and thus not recommended. You can define enums within a message definition, as in the earlier example, or outside – these enums can be reused in any message definition in your .proto file. You can also use an enum type declared in one message as the type of a field in a different message, using the syntax _MessageType_._EnumType_.

When you run the protocol buffer compiler on a .proto that uses an enum, the generated code will have a corresponding enum for Java, Kotlin, or C++, or a special EnumDescriptor class for Python that’s used to create a set of symbolic constants with integer values in the runtime-generated class.

Important

The generated code may be subject to language-specific limitations on the number of enumerators (low thousands for one language). Review the limitations for the languages you plan to use.

During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.

Important

For information on how enums should work contrasted with how they currently work in different languages, see Enum Behavior.

For more information about how to work with message enums in your applications, see the generated code guide for your chosen language.

Reserved Values

If you update an enum type by entirely removing an enum entry, or commenting it out, future users can reuse the numeric value when making their own updates to the type. This can cause severe issues if they later load old versions of the same .proto, including data corruption, privacy bugs, and so on. One way to make sure this doesn’t happen is to specify that the numeric values (and/or names, which can also cause issues for JSON serialization) of your deleted entries are reserved. The protocol buffer compiler will complain if any future users try to use these identifiers. You can specify that your reserved numeric value range goes up to the maximum possible value using the max keyword.

enum Foo {
  reserved 2, 15, 9 to 11, 40 to max;
  reserved "FOO", "BAR";
}

Note that you can’t mix field names and numeric values in the same reserved statement.

Using Other Message Types

You can use other message types as field types. For example, let’s say you wanted to include Result messages in each SearchResponse message – to do this, you can define a Result message type in the same .proto and then specify a field of type Result in SearchResponse:

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

Importing Definitions

In the earlier example, the Result message type is defined in the same file as SearchResponse – what if the message type you want to use as a field type is already defined in another .proto file?

You can use definitions from other .proto files by importing them. To import another .proto’s definitions, you add an import statement to the top of your file:

import "myproject/other_protos.proto";

By default, you can use definitions only from directly imported .proto files. However, sometimes you may need to move a .proto file to a new location. Instead of moving the .proto file directly and updating all the call sites in a single change, you can put a placeholder .proto file in the old location to forward all the imports to the new location using the import public notion.

Note that the public import functionality is not available in Java.

import public dependencies can be transitively relied upon by any code importing the proto containing the import public statement. For example:

// new.proto
// All definitions are moved here
// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";
// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

The protocol compiler searches for imported files in a set of directories specified on the protocol compiler command line using the -I/--proto_path flag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the --proto_path flag to the root of your project and use fully qualified names for all imports.

Using proto2 Message Types

It’s possible to import proto2 message types and use them in your proto3 messages, and vice versa. However, proto2 enums cannot be used directly in proto3 syntax (it’s okay if an imported proto2 message uses them).

Nested Types

You can define and use message types inside other message types, as in the following example – here the Result message is defined inside the SearchResponse message:

message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  }
  repeated Result results = 1;
}

If you want to reuse this message type outside its parent message type, you refer to it as _Parent_._Type_:

message SomeOtherMessage {
  SearchResponse.Result result = 1;
}

You can nest messages as deeply as you like. In the example below, note that the two nested types named Inner are entirely independent, since they are defined within different messages:

message Outer {       // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      int64 ival = 1;
      bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      int32 ival = 1;
      bool  booly = 2;
    }
  }
}

Updating A Message Type

If an existing message type no longer meets all your needs – for example, you’d like the message format to have an extra field – but you’d still like to use code created with the old format, don’t worry! It’s very simple to update message types without breaking any of your existing code when you use the binary wire format.

标签:proto,支持,field,value,类型,message,type,your
From: https://www.cnblogs.com/siyunianhua/p/18152013

相关文章

  • 高通将支持 Meta Llama 3 在骁龙终端运行;特斯拉中国全系车型降价 1.4 万元丨 RTE 开发
       开发者朋友们大家好: 这里是「RTE开发者日报」,每天和大家一起看新闻、聊八卦。我们的社区编辑团队会整理分享RTE(RealTimeEngagement)领域内「有话题的新闻」、「有态度的观点」、「有意思的数据」、「有思考的文章」、「有看点的会议」,但内容仅代表编辑的个人观点......
  • 新品NAS主板,支持vPro,类似ipmi可以小试牛刀
    畅网推出全新的Q670-NAS8盘位主板企业级规格,请看下图。这款主板支持vPro,可以远程管理bios安装系统,家用机器服务器级别的待遇享受。vPro平台最为“离谱”的功能:远程管理,别误会,这里的远程管理并非远程管理软件那样的功能,通过英特尔的AMT主动管理技术和英特尔端点管理助手,即可远程......
  • Serverless 成本再优化:Knative 支持抢占式实例
    作者:元毅、向先Knative是一款云原生、跨平台的开源Serverless应用编排框架,而抢占式实例是公有云中性价比较高的资源。Knative与抢占式实例的结合可以进一步降低用户资源使用成本。本文介绍如何在Knative中使用抢占式实例。背景信息抢占式实例是一种低成本竞价型实例,您可......
  • 电动车摩托车灯DC-DC降压恒流芯片AP5170支持线性调光95%高效率IC
    产品描述AP5170是一款效率高,稳定可靠的LED灯恒流驱动控制芯片,内置高精度比较器,固定关断时间控制电路,恒流驱动电路等,特别适合大功率LED恒流驱动。AP5170采用ESOP8封装,散热片内置接SW脚,通过调节外置电流检测的电阻值来设置流过LED灯的电流,支持外加电压线性调光,最大电......
  • python 二进制序列类型 bytes 和 bytearray
    bytesbytes定义bytes是一个不可变序列,用于存储字节数据。bytes对象包含范围在0到255之间的整数序列,通常用于处理二进制数据、文本数据的字节表示、以及网络通信中的原始数据传输。创建bytes对象使用b'...'表示字节字符串,各个字符以ASCII对应的单字节值表示。使用byte......
  • TS — 类型推论和兼容性
    一、类型推论的原理1.基于初始化值:TypeScript编译器会根据变量的初始化值推断其类型:letx=10;//推断x的类型为numberlety='hello';//推断y的类型为string2.基于上下文:如果无法根据初始化值推断类型,编译器会根据变量的使用方式和上下文推断类型:letz=x......
  • 前端【TS】02-typescript【基础】【搭建Vite+Vue3+TS项目】【为ref标注类型】
    前置基于Vite创建Vue3+TS环境vite官方文档:https://cn.vitejs.dev/guide/vite除了支持基础阶段的纯TS环境之外,还支持Vue+TS开发环境的快速创建,命令如下:1npmcreatevite@latestvue-ts-project----templatevue-ts23//说明:41.npmcreatevite@lates......
  • NanoPi-NEO 全志H3移植Ubuntu 22.04 LTS、u-boot、Linux内核/内核树、mt7601u USB-Wi-
    前言想在NanoPi-NEO上开发屏幕驱动,但是看了下文件目录发现没有内核树,导致最基础的file_operations结构体都无法使用,于是寻找内核树安装方法。但官方提供的内核为4.14太旧了apt找不到对应的linux-source版本(其实后面发现不需要用apt,可以在kernel.org上下载,但反正都装了那就当学习......
  • C基本数据类型小结
    在C语言中,基本数据类型用于定义变量的类型和存储数据的方式。C语言提供了几种常见的基本数据类型,包括以下几种:整型(Integer): 整型用于表示整数值。在C语言中,整型可以分为不同的大小和范围,取决于具体的实现。常见的整型类型有:int:表示整数,通常为机器字长大小,常见的取值范围是......
  • vue中ts引入组件,无法找到模块xxx的声明文件。xxx隐式拥有 "any" 类型。
    原因说明简单来说就是ts不认识.vue这个类型,需要定义声明。我刚学ts不是很懂为什么vite官方内写了那么多类型声明就是不写.vue。解决方法在项目根目录下找到env.d.ts文件,这个文件定义类型声明,简单地说就是让ts认识各种类型,尤其是文件。那么解决方法显而易见,我们自定义vue的......