


Original words rewritten: An unexpected discovery is that what was originally regarded as a bug is actually a feature in the design of Protobuf
Hello everyone, I am amazing.
Recently, in our project, we use the protobuf format as a carrier for storing data. I accidentally buried a big hole for myself, but it took me a long time to discover it.
Introduction to protobuf
protobuf’s full name is Protocol buffers. It was developed by Google and is a cross-language, cross-platform, and scalable serialized data Mechanisms. Similar to XML, but smaller, faster, and simpler. You only need to define once how you want your data to be structured, and then you can use its generation tools to generate source code that includes some serialization and deserialization operations. Structured data can be easily written and read from a variety of data streams and using a variety of programming languages.
The proto2 version supports code generation in Java, Python, Objective-C and C. With the new proto3 language version, you can also use Kotlin, Dart, Go, Ruby, PHP and C#, and many more languages.
How did you find it?
In our new project, we store the data of the project run by using protobuf format. In this way, during the debugging process, we may perform local debugging based on the data recorded on site.
message ImageData { // ms int64 timestamp = 1; int32 id = 2; Data mat = 3; } message PointCloud { // ms int64 timestamp = 1; int32 id = 2; PointData pointcloud = 3; } message State { // ms int64 timestamp = 1; string direction = 2; } message Sensor { repeated PointCloud point_data = 1; repeated ImageData image_data = 2; repeated State vehicle_data = 3; }
We define such a set of data, and then when storing, because the frame rates of the three data sources of Sensor are different, when storing, a single Sensor actually only contains one set of data. In addition, Two types of data are not included.
We didn't encounter problems when we only recorded a single pack. Until we feel that a single packet cannot be recorded for a long time, we need to find a solution to split the packet.
At that time, I thought this must be very simple, so we set it up. When a package reaches 500M, we will store the subsequent data in a new package. I finished writing it very smoothly and then put it on site for data recording. After recording for a while, we took the package back and simulated testing our new program. It was found that there was a problem in parsing the data of some packages. The program will get stuck in the middle of running. After many tests, it was found that some packages have this problem.
What we suspected at first was that the way to judge the file size was wrong, which affected subcontracting. Because when judging the file size, the file will be opened. But after judging several other ways of not opening the file, the split was carried out. I still encountered problems with some of the recorded packages.
Only then did I suspect that protobuf has some special requirements for storing data. Later, I read some articles and learned that protobuf requires identifiers to store multiple sets of data into one file. Otherwise, when parsing back from the file, protobuf does not know where the stop character of a single data is, causing data parsing errors.
Here, this pit appears. We store a series of data into a single package without any separator operations. When protobuf parses, all the contents in the file are parsed into a single Sensor. Sensor contains all data, and protobuf actively merges all stored data.
At this time, I discovered that when I recorded single packets in the past, the data was all correct. That was really my luck. protobuf happens to be parsed successfully.
How to solve it?
Now that we know that protobuf will operate in this way, we only need to know how to divide protobuf. This method is really hard to find because there are too few people like us who use it. Chinese search can’t find this content at all. Maybe everyone doesn’t use protobuf to store data. The method everyone uses should be the scenario of interaction among multiple services.
Finally found the answer through some answers on stackoverflow. From the answers, I learned that this solution was only officially merged in protobuf 3.3. It seems that this function is really rarely used.
bool SerializeDelimitedToOstream(const MessageLite& message, std::ostream* output); bool ParseDelimitedFromZeroCopyStream( MessageLite* message, io::ZeroCopyInputStream* input, bool* clean_eof);
Through this pair of methods, files can be stored and read one by one according to the data flow. No more worrying about data being merged and read.
Of course, the data stored in this way cannot be parsed by the original parsing method, and the format of the storage has completely changed. This method will store the size of the binary data first, and then store the binary data.
Conclusion
After a lot of tossing, I finally solved this segmentation pit. The usage scenario may be relatively niche, resulting in a lot of information that cannot be found at all. I discovered these problems by looking at the source code myself. The source code of C is really difficult to read. There are many template methods and template classes and it is easy to miss some details. Finally, I looked at the C# code and finally confirmed it.
The above is the detailed content of Original words rewritten: An unexpected discovery is that what was originally regarded as a bug is actually a feature in the design of Protobuf. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

1. Introduction With the popularization of mobile devices and the improvement of computing power, image segmentation technology has become a research hotspot. MobileSAM (MobileSegmentAnythingModel) is an image segmentation model optimized for mobile devices. It aims to reduce computational complexity and memory usage while maintaining high-quality segmentation results, so as to run efficiently on mobile devices with limited resources. This article will introduce the principles, advantages and application scenarios of MobileSAM in detail. 2. Design ideas of the MobileSAM model. The design ideas of the MobileSAM model mainly include the following aspects: Lightweight model: In order to adapt to the resource limitations of mobile devices, the MobileSAM model adopts a lightweight model.

With the continuous development of artificial intelligence technology, image semantic segmentation technology has become a popular research direction in the field of image analysis. In image semantic segmentation, we segment different areas in an image and classify each area to achieve a comprehensive understanding of the image. Python is a well-known programming language. Its powerful data analysis and data visualization capabilities make it the first choice in the field of artificial intelligence technology research. This article will introduce how to use image semantic segmentation technology in Python. 1. Prerequisite knowledge is deepening

Golang and FFmpeg: How to implement audio synthesis and segmentation, specific code examples are required Summary: This article will introduce how to use Golang and FFmpeg libraries to implement audio synthesis and segmentation. We will use some specific code examples to help readers understand better. Introduction: With the continuous development of audio processing technology, audio synthesis and segmentation have become common functional requirements in daily life and work. As a fast, efficient and easy to write and maintain programming language, Golang, coupled with FFmpeg

Sometimes, we need to send a large file to others, but due to limitations of the transmission channel, such as the limit on the size of email attachments, or the network condition is not very good, we need to divide the large file into small files and send them in multiple times. Then merge these small files. Today I will share how to split and merge large files using Python. Idea and implementation If it is a text file, it can be divided by the number of lines. Whether it is a text file or a binary file, it can be split according to the specified size. Using Python's file reading and writing function, you can split and merge files, set the size of each file, and then read bytes of the specified size and write them into a new file. The receiving end reads the small files in sequence and writes the The bytes are written to a file in order, so

Many friends need to record screens for office work or transfer files, but sometimes the problem of files that are too large causes a lot of trouble. The following is a solution to the problem of files that are too large, let’s take a look. What to do if the win10 screen recording file is too large: 1. Download the software Format Factory to compress the file. Download address >> 2. Enter the main page and click the "Video-MP4" option. 3. Click "Add File" on the conversion format page and select the MP4 file to be compressed. 4. Click "Output Configuration" on the page to compress the file according to the output quality. 5. Select "Low Quality and Size" from the drop-down configuration list and click "OK". 6. Click "OK" to complete the import of video files. 7. Click "Start" to start the conversion. 8. After completion, you can

Hi everyone, I'm awesome. Recently, in our project, we used protobuf format as a carrier for storing data. I accidentally buried a big hole for myself, but it took me a long time to discover it. Introduction to protobuf The full name of protobuf is Protocalbuffers. It was developed by Google and is a cross-language, cross-platform, and scalable mechanism for serializing data. Similar to XML, but smaller, faster, and simpler. You only need to define once how you want your data to be structured, and then you can use its generation tools to generate source code that includes some serialization and deserialization operations. Can be easily written from a variety of data streams and using a variety of programming languages

In PHP development, strings often need to be split into several substrings to make it more convenient for us to process data. At this time, PHP provides the explode() function to help us achieve this goal. The basic syntax of explode() function is: explode(string$delimiter,string$string[,int$limit=PHP_MAXPATHLEN])where, $delimiter

Recently, the ReLER Laboratory of Zhejiang University deeply combined SAM with video segmentation and released Segment-and-TrackAnything (SAM-Track). SAM-Track gives SAM the ability to track video targets and supports multiple ways of interaction (points, brushes, text). On this basis, SAM-Track unifies multiple traditional video segmentation tasks, achieves one-click segmentation tracking of any target in any video, and extrapolates traditional video segmentation to universal video segmentation. SAM-Track has excellent performance and can stably track hundreds of targets with high quality in complex scenarios with only a single card. Project address: https://github.co
