Monday 15 August 2011

eTag algorithm for multipart S3 uploads in Java? -



eTag algorithm for multipart S3 uploads in Java? -

i understand, in theory, algorithm generating s3 multi-part upload etag. but, i'm not getting expected results. can help?

theory of etag multi-part uploads (at to the lowest degree understanding):

take md5 of each upload part , concatenate them. then, take md5 of concatenated md5s. finally, add together "-" , number of parts uploaded.

note: illustration below uses made md5 values. resulting md5 not actual md5 of part md5's

e.g.

283771245d05b26c35768d1f182fbac0 - file part 1's md5 673c3f1ad03d60ea0f64315095ad7131 - file part 2's md5 11c68be603cbe39357a0f93be6ab9e2c - file part 3's md5

concatenated md5: 283771245d05b26c35768d1f182fbac0673c3f1ad03d60ea0f64315095ad713111c68be603cbe39357a0f93be6ab9e2c

the md5 of concatenated string above dash , number of file parts: 115671880dfdfe8860d6aabd09139708-3

to in java i've tried 2 methods - neither of returns right etag value

int mb = 1048576; int buffersize = 5 * mb; byte[] buffer = new byte[ buffersize ]; seek { // string method fileinputstream fis = new fileinputstream( new file( filename ) ); int bytesread; string md5s = ""; { bytesread = fis.read( buffer ); string md5 = org.apache.commons.codec.digest.digestutils.md5hex( new string( buffer ) ); md5s += md5; } while ( bytesread == buffersize ); system.out.println( org.apache.commons.codec.digest.digestutils.md5hex( md5s ) ); fis.close(); } catch( exception e ) { system.out.println( e ); } seek { // byte array method fileinputstream fis = new fileinputstream( new file( filename ) ); int bytesread; bytearrayoutputstream bytearrayoutputstream = new bytearrayoutputstream(); { bytesread = fis.read( buffer ); bytearrayoutputstream.write( org.apache.commons.codec.digest.digestutils.md5( buffer ) ); } while ( bytesread == buffersize ); system.out.println( org.apache.commons.codec.digest.digestutils.md5hex( bytearrayoutputstream.tobytearray() ) ); fis.close(); } catch( exception e ) { system.out.println( e ); }

can spot why neither algorithm working?

you should utilize byte oriented method.

that fails because:

} while ( bytesread == buffersize );

fails if file consists of exactly x parts.

besides fails for:

bytearrayoutputstream.write( org.apache.commons.codec.digest.digestutils.md5( buffer ) );

if block isn't filled bytes, i.e. when file not consist of exactly x parts.

in other words, fails.

java algorithm amazon-web-services hash amazon-s3

No comments:

Post a Comment