Java - Why String( a).getBytes() == a not give the same result? -
for illustration array:
byte[] arr = {37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, -104 };
and code:
string = new string(arr, charset.forname("us-ascii")); system.out.println(arrays.tostring(arr)); system.out.println(arrays.tostring(a.getbytes(charset.forname("us-ascii")))); system.out.println( arrays.equals(arr, a.getbytes(charset.forname("us-ascii"))) );
the result is:
in "windows-1251":
[37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, -104] [37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, 63] false
in "us-ascii":
[37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, -104] [37, 80, 68, 70, 45, 49, 46, 53, 13, 37, 63, 63, 63, 63, 63, 63] false
in "utf-8":
[37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, -104] [37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -17, -65, -67, -17, -65, -67, -17, -65, -67, -45, -121, -17, -65, -67] false
i have test various test case , found give different arrays when there negative number. , tried "windows-1251" in question arrays still different. question is:
why? how prepare it?addtional info:
i'm using jre8 , on windows 8.1.resolution: utilize charset iso-8859-1, give thanks slaks explaining , jb nizet point out iso-8859-1
string = new string(arr, charset.forname("iso-8859-1")); system.out.println(arrays.tostring(arr)); system.out.println(arrays.tostring(a.getbytes(charset.forname("iso-8859-1")))); system.out.println( arrays.equals(arr, a.getbytes(charset.forname("iso-8859-1"))) );
result:
[37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, -104] [37, 80, 68, 70, 45, 49, 46, 53, 13, 37, -30, -29, -49, -45, -121, -104] true
63 codepoint ?
. decoder homecoming ?
every byte not valid in encoding.
for us-ascii
, includes every byte above 127.
for utf-8
, includes every byte above 127 not follow proper utf8 rules.
java
No comments:
Post a Comment