Some time ago, when writing the RPC framework, I used three serialization methods: Kryo, Hessian, and Protostuff. But at that time, because I was eager to implement the function, I just briefly looked at how to use these three serialization methods, and did not delve into the respective characteristics, advantages and disadvantages. After knowing that I have finished writing the RPC framework now, I have time to calm down and make a comparison and summary of the three methods.
Kryo, Hessain, and Protostuff are all third-party open source serialization/deserialization frameworks. To understand their respective characteristics, we first need to know what serialization/deserialization is:
Serialization: is the process of converting objects into byte sequences.
Deserialization: is the process of converting byte sequences into objects.
serialization Serialization: Convert the object into a format that is convenient for transmission. Common serialization formats: binary format, byte array, json string, xml character string.
deserialization Deserialization: The process of restoring serialized data to objects
If you are not very clear about the concepts related to serialization, you can refer to Meituan technical team’s serialization and deserialization
We first create a new Maven project
Then import the dependencies
<dependency> <groupId>org.junit.jupiter</groupId> <artifactId>junit-jupiter-api</artifactId> <version>5.8.2</version> <scope>test</scope> </dependency> <!-- 代码简化 --> <dependency> <groupId>org.projectlombok</groupId> <artifactId>lombok</artifactId> <version>1.18.20</version> </dependency> <!--kryo--> <dependency> <groupId>com.esotericsoftware</groupId> <artifactId>kryo-shaded</artifactId> <version>4.0.2</version> </dependency> <dependency> <groupId>commons-codec</groupId> <artifactId>commons-codec</artifactId> <version>1.10</version> </dependency> <!--protostuff--> <dependency> <groupId>io.protostuff</groupId> <artifactId>protostuff-core</artifactId> <version>1.7.2</version> </dependency> <dependency> <groupId>io.protostuff</groupId> <artifactId>protostuff-runtime</artifactId> <version>1.7.2</version> </dependency> <!--hessian2--> <dependency> <groupId>com.caucho</groupId> <artifactId>hessian</artifactId> <version>4.0.62</version> </dependency>
Tool class:
kryo
package cuit.pymjl.utils; import com.esotericsoftware.kryo.Kryo; import com.esotericsoftware.kryo.io.Input; import com.esotericsoftware.kryo.io.Output; import org.apache.commons.codec.binary.Base64; import org.objenesis.strategy.StdInstantiatorStrategy; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.UnsupportedEncodingException; /** * @author Pymjl * @version 1.0 * @date 2022/4/18 20:07 **/ @SuppressWarnings("all") public class KryoUtils { private static final String DEFAULT_ENCODING = "UTF-8"; //每个线程的 Kryo 实例 private static final ThreadLocal<Kryo> KRYO_LOCAL = new ThreadLocal<Kryo>() { @Override protected Kryo initialValue() { Kryo kryo = new Kryo(); /** * 不要轻易改变这里的配置!更改之后,序列化的格式就会发生变化, * 上线的同时就必须清除 Redis 里的所有缓存, * 否则那些缓存再回来反序列化的时候,就会报错 */ //支持对象循环引用(否则会栈溢出) kryo.setReferences(true); //默认值就是 true,添加此行的目的是为了提醒维护者,不要改变这个配置 //不强制要求注册类(注册行为无法保证多个 JVM 内同一个类的注册编号相同;而且业务系统中大量的 Class 也难以一一注册) kryo.setRegistrationRequired(false); //默认值就是 false,添加此行的目的是为了提醒维护者,不要改变这个配置 //Fix the NPE bug when deserializing Collections. ((Kryo.DefaultInstantiatorStrategy) kryo.getInstantiatorStrategy()) .setFallbackInstantiatorStrategy(new StdInstantiatorStrategy()); return kryo; } }; /** * 获得当前线程的 Kryo 实例 * * @return 当前线程的 Kryo 实例 */ public static Kryo getInstance() { return KRYO_LOCAL.get(); } //----------------------------------------------- // 序列化/反序列化对象,及类型信息 // 序列化的结果里,包含类型的信息 // 反序列化时不再需要提供类型 //----------------------------------------------- /** * 将对象【及类型】序列化为字节数组 * * @param obj 任意对象 * @param <T> 对象的类型 * @return 序列化后的字节数组 */ public static <T> byte[] writeToByteArray(T obj) { ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); Output output = new Output(byteArrayOutputStream); Kryo kryo = getInstance(); kryo.writeClassAndObject(output, obj); output.flush(); return byteArrayOutputStream.toByteArray(); } /** * 将对象【及类型】序列化为 String * 利用了 Base64 编码 * * @param obj 任意对象 * @param <T> 对象的类型 * @return 序列化后的字符串 */ public static <T> String writeToString(T obj) { try { return new String(Base64.encodeBase64(writeToByteArray(obj)), DEFAULT_ENCODING); } catch (UnsupportedEncodingException e) { throw new IllegalStateException(e); } } /** * 将字节数组反序列化为原对象 * * @param byteArray writeToByteArray 方法序列化后的字节数组 * @param <T> 原对象的类型 * @return 原对象 */ @SuppressWarnings("unchecked") public static <T> T readFromByteArray(byte[] byteArray) { ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(byteArray); Input input = new Input(byteArrayInputStream); Kryo kryo = getInstance(); return (T) kryo.readClassAndObject(input); } /** * 将 String 反序列化为原对象 * 利用了 Base64 编码 * * @param str writeToString 方法序列化后的字符串 * @param <T> 原对象的类型 * @return 原对象 */ public static <T> T readFromString(String str) { try { return readFromByteArray(Base64.decodeBase64(str.getBytes(DEFAULT_ENCODING))); } catch (UnsupportedEncodingException e) { throw new IllegalStateException(e); } } //----------------------------------------------- // 只序列化/反序列化对象 // 序列化的结果里,不包含类型的信息 //----------------------------------------------- /** * 将对象序列化为字节数组 * * @param obj 任意对象 * @param <T> 对象的类型 * @return 序列化后的字节数组 */ public static <T> byte[] writeObjectToByteArray(T obj) { ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); Output output = new Output(byteArrayOutputStream); Kryo kryo = getInstance(); kryo.writeObject(output, obj); output.flush(); return byteArrayOutputStream.toByteArray(); } /** * 将对象序列化为 String * 利用了 Base64 编码 * * @param obj 任意对象 * @param <T> 对象的类型 * @return 序列化后的字符串 */ public static <T> String writeObjectToString(T obj) { try { return new String(Base64.encodeBase64(writeObjectToByteArray(obj)), DEFAULT_ENCODING); } catch (UnsupportedEncodingException e) { throw new IllegalStateException(e); } } /** * 将字节数组反序列化为原对象 * * @param byteArray writeToByteArray 方法序列化后的字节数组 * @param clazz 原对象的 Class * @param <T> 原对象的类型 * @return 原对象 */ @SuppressWarnings("unchecked") public static <T> T readObjectFromByteArray(byte[] byteArray, Class<T> clazz) { ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(byteArray); Input input = new Input(byteArrayInputStream); Kryo kryo = getInstance(); return kryo.readObject(input, clazz); } /** * 将 String 反序列化为原对象 * 利用了 Base64 编码 * * @param str writeToString 方法序列化后的字符串 * @param clazz 原对象的 Class * @param <T> 原对象的类型 * @return 原对象 */ public static <T> T readObjectFromString(String str, Class<T> clazz) { try { return readObjectFromByteArray(Base64.decodeBase64(str.getBytes(DEFAULT_ENCODING)), clazz); } catch (UnsupportedEncodingException e) { throw new IllegalStateException(e); } } }
Hessian
package cuit.pymjl.utils; import com.caucho.hessian.io.Hessian2Input; import com.caucho.hessian.io.Hessian2Output; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; /** * @author Pymjl * @version 1.0 * @date 2022/7/2 12:39 **/ public class HessianUtils { /** * 序列化 * * @param obj obj * @return {@code byte[]} */ public static byte[] serialize(Object obj) { Hessian2Output ho = null; ByteArrayOutputStream baos = null; try { baos = new ByteArrayOutputStream(); ho = new Hessian2Output(baos); ho.writeObject(obj); ho.flush(); return baos.toByteArray(); } catch (Exception ex) { ex.printStackTrace(); throw new RuntimeException("serialize failed"); } finally { if (null != ho) { try { ho.close(); } catch (IOException e) { e.printStackTrace(); } } if (null != baos) { try { baos.close(); } catch (IOException e) { e.printStackTrace(); } } } } /** * 反序列化 * * @param bytes 字节 * @param clazz clazz * @return {@code T} */ public static <T> T deserialize(byte[] bytes, Class<T> clazz) { Hessian2Input hi = null; ByteArrayInputStream bais = null; try { bais = new ByteArrayInputStream(bytes); hi = new Hessian2Input(bais); Object o = hi.readObject(); return clazz.cast(o); } catch (Exception ex) { throw new RuntimeException("deserialize failed"); } finally { if (null != hi) { try { hi.close(); } catch (IOException e) { e.printStackTrace(); } } if (null != bais) { try { bais.close(); } catch (IOException e) { e.printStackTrace(); } } } } }
Protostuff
package cuit.pymjl.utils; import io.protostuff.LinkedBuffer; import io.protostuff.ProtostuffIOUtil; import io.protostuff.Schema; import io.protostuff.runtime.RuntimeSchema; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; /** * @author Pymjl * @version 1.0 * @date 2022/6/28 21:00 **/ public class ProtostuffUtils { /** * 避免每次序列化都重新申请Buffer空间 * 这个字段表示,申请一个内存空间用户缓存,LinkedBuffer.DEFAULT_BUFFER_SIZE表示申请了默认大小的空间512个字节, * 我们也可以使用MIN_BUFFER_SIZE,表示256个字节。 */ private static final LinkedBuffer BUFFER = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE); /** * 缓存Schema * 这个字段表示缓存的Schema。那这个Schema是什么呢?就是一个组织结构,就好比是数据库中的表、视图等等这样的组织机构, * 在这里表示的就是序列化对象的结构。 */ private static final Map<Class<?>, Schema<?>> SCHEMA_CACHE = new ConcurrentHashMap<>(); /** * 序列化方法,把指定对象序列化成字节数组 * * @param obj 对象 * @return byte[] */ @SuppressWarnings("unchecked") public static <T> byte[] serialize(T obj) { Class<T> clazz = (Class<T>) obj.getClass(); Schema<T> schema = getSchema(clazz); byte[] data; try { data = ProtostuffIOUtil.toByteArray(obj, schema, BUFFER); } finally { BUFFER.clear(); } return data; } /** * 反序列化方法,将字节数组反序列化成指定Class类型 * * @param data 字节数组 * @param clazz 字节码 * @return */ public static <T> T deserialize(byte[] data, Class<T> clazz) { Schema<T> schema = getSchema(clazz); T obj = schema.newMessage(); ProtostuffIOUtil.mergeFrom(data, obj, schema); return obj; } @SuppressWarnings("unchecked") private static <T> Schema<T> getSchema(Class<T> clazz) { Schema<T> schema = (Schema<T>) SCHEMA_CACHE.get(clazz); if (schema == null) { schema = RuntimeSchema.getSchema(clazz); if (schema == null) { SCHEMA_CACHE.put(clazz, schema); } } return schema; } }
Create an entity class for testing:
package cuit.pymjl.entity; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor; import java.io.Serial; import java.io.Serializable; /** * @author Pymjl * @version 1.0 * @date 2022/7/2 12:32 **/ @Data @AllArgsConstructor @NoArgsConstructor public class Student implements Serializable { @Serial private static final long serialVersionUID = -91809837793898L; private String name; private String password; private int age; private String address; private String phone; }
Write test class:
public class MainTest { @Test void testLength() { Student student = new Student("pymjl", "123456", 18, "北京", "123456789"); int kryoLength = KryoUtils.writeObjectToByteArray(student).length; int hessianLength = HessianUtils.serialize(student).length; int protostuffLength = ProtostuffUtils.serialize(student).length; System.out.println("kryoLength: " + kryoLength); System.out.println("hessianLength: " + hessianLength); System.out.println("protostuffLength: " + protostuffLength); } }
Run screenshot:
As can be seen from the figure, the space occupied by bytes after Hessian serialization is significantly larger than the other two methods
Hessian uses fixed lengths to store ints and longs, while kryo uses variable-length ints and longs to ensure that this basic data type is as small as possible after serialization. In actual applications, large data will not appear often.
When Kryo serializes, you need to pass in the complete class name or use register() to register the class to Kryo in advance. Its class is associated with an int type ID, and the sequence Only this ID is stored in, so the sequence volume is smaller, while Hessian puts all class field information into the serialized byte array, and directly uses the byte array for deserialization, without any other participation, because the stored The speed of multi-processing of things will be slower
Kryo does not need to implement the Serializable interface, while Hessian needs to implement the field increase of the Kryo data class
, minus, serialization and deserialization are incompatible, while Hessian is compatible. Protostuff is only compatible by adding new fields at the end
Kryo and Hessian use involves The data class must have a no-argument constructor
Hessian will store all the properties of the complex object in a Map for serialization. Therefore, when there are member variables with the same name in the parent class and subclass, during Hessian serialization, the subclass is serialized first, and then the parent class is serialized. Therefore, the deserialization result will cause the member variable with the same name in the subclass to be overwritten by the value of the parent class.
Kryo is not thread-safe. You need to use ThreadLocal or create a Kryo thread pool to ensure thread safety, while Protostuff is thread-safe
The formats of Protostuff and Kryo serialization are similar. They both use a mark to record the field type, so the serialized volume is relatively small.
Advantages | Disadvantages | |
---|---|---|
Kryo | Fast speed, small size after serialization | Cross-language support is complicated |
Hessian | Default cross-language support | Slower |
Protostuff | Fast speed, based on protobuf | Requires static compilation |
Protostuff-Runtime | No need for static compilation, but Schema must be passed in before serialization | Does not support classes without default constructors. Users need to initialize the serialized object themselves during deserialization, and they are only responsible for assigning the object |
Java | Easy to use, can serialize all classes | Slow and takes up space |
The above is the detailed content of What are the commonly used serialization methods in Java? Take Kryo, Protostuff and Hessian as examples to explain their implementation principles.. For more information, please follow other related articles on the PHP Chinese website!