< script src="https://unpkg.com/@highlightjs/cdn-assets@11.0.0/highlight.min.js">

Tech Blog

Facebook Icon Twitter Icon Linkedin Icon

AnyMind Group

Facebook Icon Twitter Icon Linkedin Icon

[Tech Blog] A quick guide into Protobuf

Hello! I am Alexander, a back-end engineer in the AnyTag project. In this quick guide I want to introduce you ProtoBuffers aka Protobuf, when need to use it and give an example on how to use it in Kotlin and Python. If you still never heard about Protobuf or had no time to try it this tour is for you.

What is it?

Protobuf is a binary format for message exchange between network services, created by Google with open-source code. The main focus was on creating simpler, faster, and smaller-sized format relatively to existing formats. PB is supported by Google and the community so it can be used with most common programming languages.

When to use

  • Fast serialization/deserialization: as I said Protobuf is a binary format so messages contain field descriptors instead of full names unlike in json, because of that PBs faster than json in 6-10 times by different sources;
  • Type-safety: all types are described in schema that guarantees successful message deserialization;
  • Schema-driven: if you have problems with synchronization data structures between different services or they have written in different programming languages;
  • Compiler: once schema is defined you can reduce coding just compiled PB schema files for different languages;
  • RPC: because PBs is schema-driven it’s easy to use in RPC environment

When not to use

  • You need messages to be human-readable – PB is a binary
  • Data you send consumed by web-browser – then json is better as a native data format in JS world
  • Back-end services are written in JavaScript
  • You don’t need data schema
  • You have no time to use new tools

Installation and requirements

Google provides a compiler of proto files and it’s possible to use it with a command line, but I prefer to use gradle plugin which helps to set up output languages and source directories for generated classes from proto files.

So I will show in the next step how to work with protobuf from gradle to generate kotlin and python code.

...
plugins {
    ...
    id "java" // need since java code generation required for kotlin
    id "idea" // this automatically registers the proto files and generated code as sources
    id "com.google.protobuf" version "0.8.17" // protobuf plugin
}

dependencies {
    ...
    implementation 'com.google.protobuf:protobuf-kotlin:3.18.0' // generated kotlin code required classes from this lib
}

protobuf {
    protoc {
        artifact = 'com.google.protobuf:protoc:3.18.0'
    } // bundled compiler for proto files

    generateProtoTasks {
        all().each { task ->
            task.builtins {
                python { }
                kotlin { }
            }
        }
    } // specification to generate python and kotlin; java generated by default and can be removed, but required in our case baceuse of kotlin
}
...

You can see that setup is pretty simple, the only things we need to tune up are plugins, one dependency for kotlin generated code, compiler bundle and output languages.

How to use

Proto file

Firstly take a look at proto file. Example proto file describes an instagram profile’s followers country breakdown.

syntax = "proto3";

option java_package = "com.casting_asia.account_management.application.infrastructure.gcp";

message IgFollowerCountriesCount {
    string country = 1;
    int32 count = 2;
}

message IgFollowerCountry {
    int64 account_id = 1;
    repeated IgFollowerCountriesCount countries = 2;
}

It’s important to specify protobuf version, in the example it’s 3. But it can be 2 and syntax is varied from version to version.

To specify a java package for generated code used java_package option.

A keyword message is used to declare a class that will be generated. Names should be declared in CamelCase following Google style guide.

Typically, field declaration contains type, name, and number. Sometimes it can contain rules. Let’s take a look at the details:

  • types can be scalar, enumeration, message type, or string
  • name should be declared in snake_case
  • number it’s a required and unique identifier for message binary format; you should not change the number once message type become in use; note that fields with a number from 1 to 15 better use for very frequently occurring elements because it’s encoding as one byte in a binary format unlike field numbers in the range 16 through 2047 that take two bytes.
  • also, a declaration can contain a rule repeated – it’s mean that field contains a list of something; the default rule for the field is singular which means a message contains one or zero values for this field.

Generate and use

Kotlin

Now we can generate classes. Use gradle task clean build to do that. Compiler generate classes PubSub.IgFollowerCountry and PubSub.IgFollowerCountriesCount in Java where prefix PubSub is name of proto file. Also, it generates functions igFollowerCountry and igFollowerCountriesCount in kotlin to create instances of the classes. One thing that remains is to set up your framework/library to receive and convert protobuf messages.

Python

On the other side we have a service written in python that sends protobuf messages. Here is a small snippet explained how to handle generated classes in python.

countries = IgFollowerCountry()
countries.account_id = account_id
for country_id, count in data.items():
    country_data = IgFollowerCountriesCount()
    country_data.country = country_id
    country_data.count = count
    countries.countries.append(country_data)
    
countries.SerializeToString()

As you can see the only thing you need is to create an instance of the outer class, assign values, and appends lists because lists are already initialized. The last step is to convert the result to a string. Generated classes already have function for it named SerializeToString (or ParseFromString in opposite case).

Here generated classes prevent you to send data with the wrong assigned type or using the wrong named field.

Conclusion

I hope this quick was helpful for you. As I demonstrated Protobuf rescues when services are implemented in different languages, reduces coding, prevents mistakes in field naming and typing. Also, PB has other great features like fast performance or grpc integration.

Links

Beating JSON performance with Protobuf

Protobuf. What is it, why you should care, and when should you use it?

Protobuf gradle plugin GitHub page

Latest News