Wednesday 15 February 2012

Counting value based on values in mysql table using Java -



Counting value based on values in mysql table using Java -

i have created programme follows:

i have 22000 authors separated 22000 files. each file contains set of keywords (can 1 or hundreds). illustration author-22945.txt consists of keywords follows:

algorithm problem computational solution general application base of operations

and then, have 100 tables in mysql (topic0 until topic99). each table consists of keywords , value. each topic table contains 20 lines. example:

keywords | value 0.021263 base of operations 0.019618 within 0.014545 new 0.010844 apply 0.010296 process 0.010158 propose 0.009199 provide 0.008239 analysis 0.007005 3 0.006594 approach 0.006183 domain 0.005634 structural 0.004126 adapt 0.003989 aspect 0.003989 product 0.003441 efficient 0.003441 info 0.003167 associate 0.003167 class 0.00303

i tried create programme produce set of value each author taken 100 topics. want average value of author topic0 topic99. if keyword author file exists in topic0, take values , sum it. , on until topic99. therefore, author can have output consists of sum of each topic [t0, t1, t2, ..., t99] illustration [0, 0.3452, 0.2343, .... , 0.08342].

currently programme quiet messy way. when tested 1 author data, worked , produced right number. when input 22000 authors produce output, took more 5 days that. , result wrong when took samples , checked them randomly.

i want inquire right algorithm or tools or logical way produce accurate output , not long compiling time it? fyi, have authors info , keywords each author in mysql table. improve using mysql innerjoin or java in situation?

thank much.

fyi, current messy code follows:

bundle counttopicscore; import java.sql.*; import java.io.*; import java.util.arrays; public class counttopicscore{ public static void main(string[] args) { try{ string mydriver = "org.gjt.mm.mysql.driver"; string myurl = "jdbc:mysql://localhost/arnetminer"; class.forname(mydriver); connection conn = drivermanager.getconnection(myurl, "root", "1234"); string query0 = "select distinct author_key authorkeywords"; statement st0 = conn.createstatement(); resultset rs0 = st0.executequery(query0); while(rs0.next()) { string authorkey = rs0.getstring("author_key"); string query = "select keywords authorkeywords author_key ='"+ authorkey +"'"+";"; statement st = conn.createstatement(); resultset rs = st.executequery(query); double value0, value1, value2, value3, value4, value5, value6, value7, value8, value9, value10, value11, value12, value13, value14, value15, value16, value17, value18, value19, value20, value21, value22, value23, value24, value25, value26, value27, value28, value29, value30, value31, value32, value33, value34, value35, value36, value37, value38, value39, value40, value41, value42, value43, value44, value45, value46, value47, value48, value49, value50, value51, value52, value53, value54, value55, value56, value57, value58, value59, value60, value61, value62, value63, value64, value65, value66, value67, value68, value69, value70, value71, value72, value73, value74, value75, value76, value77, value78, value79, value80, value81, value82, value83, value84, value85, value86, value87, value88, value89, value90, value91, value92, value93, value94, value95, value96, value97, value98, value99; double topic0value = 0.000000, topic1value = 0.000000, topic2value = 0.000000, topic3value = 0.000000, topic4value = 0.000000, topic5value = 0.000000, topic6value = 0.000000, topic7value = 0.000000, topic8value = 0.000000, topic9value = 0.000000, topic10value = 0.000000, topic11value = 0.000000, topic12value = 0.000000, topic13value = 0.000000; double topic14value = 0.000000; double topic15value = 0.000000; double topic16value = 0.000000; double topic17value = 0.000000; double topic18value = 0.000000; double topic19value = 0.000000; double topic20value = 0.000000; double topic21value = 0.000000; double topic22value = 0.000000; double topic23value = 0.000000; double topic24value = 0.000000; double topic25value = 0.000000; double topic26value = 0.000000; double topic27value = 0.000000; double topic28value = 0.000000; double topic29value = 0.000000; double topic30value = 0.000000; double topic31value = 0.000000; double topic32value = 0.000000; double topic33value = 0.000000; double topic34value = 0.000000; double topic35value = 0.000000; double topic36value = 0.000000; double topic37value = 0.000000; double topic38value = 0.000000; double topic39value = 0.000000; double topic40value = 0.000000; double topic41value = 0.000000; double topic42value = 0.000000; double topic43value = 0.000000; double topic44value = 0.000000; double topic45value = 0.000000; double topic46value = 0.000000; double topic47value = 0.000000; double topic48value = 0.000000; double topic49value = 0.000000; double topic50value = 0.000000; double topic51value = 0.000000; double topic52value = 0.000000; double topic53value = 0.000000; double topic54value = 0.000000; double topic55value = 0.000000; double topic56value = 0.000000; double topic57value = 0.000000; double topic58value = 0.000000; double topic59value = 0.000000; double topic60value = 0.000000; double topic61value = 0.000000; double topic62value = 0.000000; double topic63value = 0.000000; double topic64value = 0.000000; double topic65value = 0.000000; double topic66value = 0.000000; double topic67value = 0.000000; double topic68value = 0.000000; double topic69value = 0.000000; double topic70value = 0.000000; double topic71value = 0.000000; double topic72value = 0.000000; double topic73value = 0.000000; double topic74value = 0.000000; double topic75value = 0.000000; double topic76value = 0.000000; double topic77value = 0.000000; double topic78value = 0.000000; double topic79value = 0.000000; double topic80value = 0.000000; double topic81value = 0.000000; double topic82value = 0.000000; double topic83value = 0.000000; double topic84value = 0.000000; double topic85value = 0.000000; double topic86value = 0.000000; double topic87value = 0.000000; double topic88value = 0.000000; double topic89value = 0.000000; double topic90value = 0.000000; double topic91value = 0.000000; double topic92value = 0.000000; double topic93value = 0.000000; double topic94value = 0.000000; double topic95value = 0.000000; double topic96value = 0.000000; double topic97value = 0.000000; double topic98value = 0.000000; double topic99value = 0.000000; while (rs.next()) { string keyword = rs.getstring("keywords"); value0 = findtopic0(keyword); value1 = findtopic1(keyword); value2 = findtopic2(keyword); value3 = findtopic3(keyword); value4 = findtopic4(keyword); value5 = findtopic5(keyword); value6 = findtopic6(keyword); value7 = findtopic7(keyword); value8 = findtopic8(keyword); value9 = findtopic9(keyword); value10 = findtopic10(keyword); value11 = findtopic11(keyword); value12 = findtopic12(keyword); value13 = findtopic13(keyword); value14 = findtopic14(keyword); value15 = findtopic15(keyword); value16 = findtopic16(keyword); value17 = findtopic17(keyword); value18 = findtopic18(keyword); value19 = findtopic19(keyword); value20 = findtopic20(keyword); value21 = findtopic21(keyword); value22 = findtopic22(keyword); value23 = findtopic23(keyword); value24 = findtopic24(keyword); value25 = findtopic25(keyword); value26 = findtopic26(keyword); value27 = findtopic27(keyword); value28 = findtopic28(keyword); value29 = findtopic29(keyword); value30 = findtopic30(keyword); value31 = findtopic31(keyword); value32 = findtopic32(keyword); value33 = findtopic33(keyword); value34 = findtopic34(keyword); value35 = findtopic35(keyword); value36 = findtopic36(keyword); value37 = findtopic37(keyword); value38 = findtopic38(keyword); value39 = findtopic39(keyword); value40 = findtopic40(keyword); value41 = findtopic41(keyword); value42 = findtopic42(keyword); value43 = findtopic43(keyword); value44 = findtopic44(keyword); value45 = findtopic45(keyword); value46 = findtopic46(keyword); value47 = findtopic47(keyword); value48 = findtopic48(keyword); value49 = findtopic49(keyword); value50 = findtopic50(keyword); value51 = findtopic51(keyword); value52 = findtopic52(keyword); value53 = findtopic53(keyword); value54 = findtopic54(keyword); value55 = findtopic55(keyword); value56 = findtopic56(keyword); value57 = findtopic57(keyword); value58 = findtopic58(keyword); value59 = findtopic59(keyword); value60 = findtopic60(keyword); value61 = findtopic61(keyword); value62 = findtopic62(keyword); value63 = findtopic63(keyword); value64 = findtopic64(keyword); value65 = findtopic65(keyword); value66 = findtopic66(keyword); value67 = findtopic67(keyword); value68 = findtopic68(keyword); value69 = findtopic69(keyword); value70 = findtopic70(keyword); value71 = findtopic71(keyword); value72 = findtopic72(keyword); value73 = findtopic73(keyword); value74 = findtopic74(keyword); value75 = findtopic75(keyword); value76 = findtopic76(keyword); value77 = findtopic77(keyword); value78 = findtopic78(keyword); value79 = findtopic79(keyword); value80 = findtopic80(keyword); value81 = findtopic81(keyword); value82 = findtopic82(keyword); value83 = findtopic83(keyword); value84 = findtopic84(keyword); value85 = findtopic85(keyword); value86 = findtopic86(keyword); value87 = findtopic87(keyword); value88 = findtopic88(keyword); value89 = findtopic89(keyword); value90 = findtopic90(keyword); value91 = findtopic91(keyword); value92 = findtopic92(keyword); value93 = findtopic93(keyword); value94 = findtopic94(keyword); value95 = findtopic95(keyword); value96 = findtopic96(keyword); value97 = findtopic97(keyword); value98 = findtopic98(keyword); value99 = findtopic99(keyword); topic0value = topic0value + value0; topic1value = topic1value + value1; topic2value = topic2value + value2; topic3value = topic3value + value3; topic4value = topic4value + value4; topic5value = topic5value + value5; topic6value = topic6value + value6; topic7value = topic7value + value7; topic8value = topic8value + value8; topic9value = topic9value + value9; topic10value = topic10value + value10; topic11value = topic11value + value11; topic12value = topic12value + value12; topic13value = topic13value + value13; topic14value = topic14value + value14; topic15value = topic15value + value15; topic16value = topic16value + value16; topic17value = topic17value + value17; topic18value = topic18value + value18; topic19value = topic19value + value19; topic20value = topic20value + value20; topic21value = topic21value + value21; topic22value = topic22value + value22; topic23value = topic23value + value23; topic24value = topic24value + value24; topic25value = topic25value + value25; topic26value = topic26value + value26; topic27value = topic27value + value27; topic28value = topic28value + value28; topic29value = topic29value + value29; topic30value = topic30value + value30; topic31value = topic31value + value31; topic32value = topic32value + value32; topic33value = topic33value + value33; topic34value = topic34value + value34; topic35value = topic35value + value35; topic36value = topic36value + value36; topic37value = topic37value + value37; topic38value = topic38value + value38; topic39value = topic39value + value39; topic40value = topic40value + value40; topic41value = topic41value + value41; topic42value = topic42value + value42; topic43value = topic43value + value43; topic44value = topic44value + value44; topic45value = topic45value + value45; topic46value = topic46value + value46; topic47value = topic47value + value47; topic48value = topic48value + value48; topic49value = topic49value + value49; topic50value = topic50value + value50; topic51value = topic51value + value51; topic52value = topic52value + value52; topic53value = topic53value + value53; topic54value = topic54value + value54; topic55value = topic55value + value55; topic56value = topic56value + value56; topic57value = topic57value + value57; topic58value = topic58value + value58; topic59value = topic59value + value59; topic60value = topic60value + value60; topic61value = topic61value + value61; topic62value = topic62value + value62; topic63value = topic63value + value63; topic64value = topic64value + value64; topic65value = topic65value + value65; topic66value = topic66value + value66; topic67value = topic67value + value67; topic68value = topic68value + value68; topic69value = topic69value + value69; topic70value = topic70value + value70; topic71value = topic71value + value71; topic72value = topic72value + value72; topic73value = topic73value + value73; topic74value = topic74value + value74; topic75value = topic75value + value75; topic76value = topic76value + value76; topic77value = topic77value + value77; topic78value = topic78value + value78; topic79value = topic79value + value79; topic80value = topic80value + value80; topic81value = topic81value + value81; topic82value = topic82value + value82; topic83value = topic83value + value83; topic84value = topic84value + value84; topic85value = topic85value + value85; topic86value = topic86value + value86; topic87value = topic87value + value87; topic88value = topic88value + value88; topic89value = topic89value + value89; topic90value = topic90value + value90; topic91value = topic91value + value91; topic92value = topic92value + value92; topic93value = topic93value + value93; topic94value = topic94value + value94; topic95value = topic95value + value95; topic96value = topic96value + value96; topic97value = topic97value + value97; topic98value = topic98value + value98; topic99value = topic99value + value99; } st.close(); double[] score = new double[100]; score[0]=topic0value; score[1]=topic1value; score[2]=topic2value; score[3]=topic3value; score[4]=topic4value; score[5]=topic5value; score[6]=topic6value; score[7]=topic7value; score[8]=topic8value; score[9]=topic9value; score[10]=topic10value; score[11]=topic11value; score[12]=topic12value; score[13]=topic13value; score[14]=topic14value; score[15]=topic15value; score[16]=topic16value; score[17]=topic17value; score[18]=topic18value; score[19]=topic19value; score[20]=topic20value; score[21]=topic21value; score[22]=topic22value; score[23]=topic23value; score[24]=topic24value; score[25]=topic25value; score[26]=topic26value; score[27]=topic27value; score[28]=topic28value; score[29]=topic29value; score[30]=topic30value; score[31]=topic31value; score[32]=topic32value; score[33]=topic33value; score[34]=topic34value; score[35]=topic35value; score[36]=topic36value; score[37]=topic37value; score[38]=topic38value; score[39]=topic39value; score[40]=topic40value; score[41]=topic41value; score[42]=topic42value; score[43]=topic43value; score[44]=topic44value; score[45]=topic45value; score[46]=topic46value; score[47]=topic47value; score[48]=topic48value; score[49]=topic49value; score[50]=topic50value; score[51]=topic51value; score[52]=topic52value; score[53]=topic53value; score[54]=topic54value; score[55]=topic55value; score[56]=topic56value; score[57]=topic57value; score[58]=topic58value; score[59]=topic59value; score[60]=topic60value; score[61]=topic61value; score[62]=topic62value; score[63]=topic63value; score[64]=topic64value; score[65]=topic65value; score[66]=topic66value; score[67]=topic67value; score[68]=topic68value; score[69]=topic69value; score[70]=topic70value; score[71]=topic71value; score[72]=topic72value; score[73]=topic73value; score[74]=topic74value; score[75]=topic75value; score[76]=topic76value; score[77]=topic77value; score[78]=topic78value; score[79]=topic79value; score[80]=topic80value; score[81]=topic81value; score[82]=topic82value; score[83]=topic83value; score[84]=topic84value; score[85]=topic85value; score[86]=topic86value; score[87]=topic87value; score[88]=topic88value; score[89]=topic89value; score[90]=topic90value; score[91]=topic91value; score[92]=topic92value; score[93]=topic93value; score[94]=topic94value; score[95]=topic95value; score[96]=topic96value; score[97]=topic97value; score[98]=topic98value; score[99]=topic99value; printwriter pr = new printwriter("e:/scoring/"+authorkey+".txt"); (int j=0; j<100 ; j++) { pr.println(score[j]); } pr.close(); } } grab (exception e){ system.err.println("got exception! "); system.err.println(e.getmessage()); } } private static double findtopic0(string keyword) { double value = 0; try{ string mydriver = "org.gjt.mm.mysql.driver"; string myurl = "jdbc:mysql://localhost/arnetminer"; class.forname(mydriver); connection conn = drivermanager.getconnection(myurl, "root", "1234"); statement st = conn.createstatement(); string querytotopictable = ("select value topic0 word="+"'"+keyword+"'"+";"); resultset valueresult = st.executequery(querytotopictable); while (valueresult.next()) { value = valueresult.getdouble("value"); } st.close(); } grab (exception e){ system.err.println("exception"); system.err.println(e.getmessage()); } homecoming value; } private static double findtopic1(string keyword) { double value = 0; try{ string mydriver = "org.gjt.mm.mysql.driver"; string myurl = "jdbc:mysql://localhost/arnetminer"; class.forname(mydriver); connection conn = drivermanager.getconnection(myurl, "root", "1234"); statement st = conn.createstatement(); string querytotopictable = ("select value topic1 word="+"'"+keyword+"'"+";"); resultset valueresult = st.executequery(querytotopictable); while (valueresult.next()) { value = valueresult.getdouble("value"); } st.close(); } grab (exception e){ system.err.println("exception"); system.err.println(e.getmessage()); } homecoming value; } private static double findtopic2(string keyword) { double value = 0; try{ string mydriver = "org.gjt.mm.mysql.driver"; string myurl = "jdbc:mysql://localhost/arnetminer"; class.forname(mydriver); connection conn = drivermanager.getconnection(myurl, "root", "1234"); statement st = conn.createstatement(); string querytotopictable = ("select value topic2 word="+"'"+keyword+"'"+";"); resultset valueresult = st.executequery(querytotopictable); while (valueresult.next()) { value = valueresult.getdouble("value"); } st.close(); } grab (exception e){ system.err.println("exception"); system.err.println(e.getmessage()); } homecoming value; } private static double findtopic3(string keyword) { double value = 0; try{ string mydriver = "org.gjt.mm.mysql.driver"; string myurl = "jdbc:mysql://localhost/arnetminer"; class.forname(mydriver); connection conn = drivermanager.getconnection(myurl, "root", "1234"); statement st = conn.createstatement(); string querytotopictable = ("select value topic3 word="+"'"+keyword+"'"+";"); resultset valueresult = st.executequery(querytotopictable); while (valueresult.next()) { value = valueresult.getdouble("value"); } st.close(); } grab (exception e){ system.err.println("exception"); system.err.println(e.getmessage()); } homecoming value; } // .... until topic99 }

the best way this, maintain info in database , calculate sql. should think table construction again.

you have 1 table store topic, keyword , value combination (you name values, since topic , keyword might foreign keys other tables later on). have table link authors keywords (e.g. author_keyword).

additional tables might hold info topics (name, ...), keywords , authors , referenced respective primary keys in above tables values , author_keyword. don't need want right now, useful in future.

you should read little bit 'database normalization', sense how setup database structure. can start wikipedia.

if setup table structure, able query database efficiently joining 2 tables, grouping , summing up.

(from understand, got author , author_keyword table. need set topicx tables 1 table.)

your query this:

select author_keyword.author, values.topic, sum(values.value) author_keyword left bring together values on author_keyword.keyword = values.keyword grouping author_keyword.author, values.topic

this should lot more efficient, tried far.

java mysql vector io

No comments:

Post a Comment