Tuesday 15 February 2011

Java serialization - how many bytes for a character? -



Java serialization - how many bytes for a character? -

i'm having string objects in java i'm serializing. i'm wondering size of each serialized character in string.

is true standard english language letters (e.g. 'a' or 'g') need 1 or 2 bytes , special symbols comma or exclamation mark need 8 bytes?

but how much bytes need number symbol (0 - 9) in serialized string?

edit: serialization in next way:

socket = new socket(host, port); objectoutputstream outputstream = new objectoutputstream(new bufferedoutputstream(socket.getoutputstream())); outputstream.writeobject(request); outputstream.flush();

the deserialization done in similar way using objectinputstream.

the object serialize (request) contains field of type string can e.g. "aaaa" or "aaaa" or "a0a3a5" etc. (i.e. uper- , lowercase letters , numbers).

you utilize java serialization complies http://docs.oracle.com/javase/6/docs/platform/serialization/spec/protocol.html.

the representation of string objects consists of length info followed contents of string encoded in modified utf-8. modified utf-8 encoding same used in javatm virtual machine , in java.io.datainput , dataoutput interfaces; differs standard utf-8 in representation of supplementary characters , of null character. form of length info depends on length of string in modified utf-8 encoding. if modified utf-8 encoding of given string less 65536 bytes in length, length written 2 bytes representing unsigned 16-bit integer. starting javatm 2 platform, standard edition, v1.3, if length of string in modified utf-8 encoding 65536 bytes or more, length written in 8 bytes representing signed 64-bit integer. typecode preceding string in serialization stream indicates format used write string.

string serialized utf-8 ascii chars encoded 1 byte , numbers ascii yes encoded 1 byte.

see http://en.wikipedia.org/wiki/utf-8 farther information.

java serialization

No comments:

Post a Comment