Sunday 15 March 2015

mysql - How to optmize query with subqueries,WHERE IN and varchar comparison fields? -



mysql - How to optmize query with subqueries,WHERE IN and varchar comparison fields? -

i working on scraping project crawl items , view counts on different schedules.schedule user defined period (date) when script intended run.

table construction follows:

create table if not exists `stats` ( `id` int(11) not null auto_increment, `schedule_id` smallint(11) not null, `type` smallint(11) not null, `name` varchar(250) collate utf8_unicode_ci not null, `views` int(11) not null, `updated_time` timestamp not null default current_timestamp on update current_timestamp, primary key (`id`) ) engine=myisam default charset=utf8 collate=utf8_unicode_ci ;

all info stored in table stats , later analyzed see type wise growth in views.

the info like:

sample set

the scraping done on periods , each schedule expected have 20k entries.the schedules made in daily or weekly basis,hence info grow be around 2-3 1000000 in 5-6 months.

over info need perform queries aggregate same name come across selected range of schedules.

for example:

i need aggregate same items(name) come across multiple schedules. if schedule 1 , 2 selected,items come under both of schedules selected.so here itema , itemb.

the type-wise sum of views should calculated here.

hence schedule 1:(updated)

select count(t.`type`) count, sum(t.views) view_count `stats` t inner bring together ( select name,count(name) c `stats` `schedule_id` in (1,2) grouping name having c=2 ) t2 on t2.`name` = t.`name` `schedule_id`=2 grouping type

this expected result.

but have read using sub queries,where in, varchar comparing fields won't help in having optimized query.how optimized improve performance.

the rules same type aggregator follows:

1.under schedule id, there same names different type value.combination of schedule_id,name , type won't duplicated.

2.type wise aggregator -which sums values under each type made.

i doing project in python -mysql scraping purpose , php listing results.i know how organize table query improve performance. please advice.

varchar column

as said in comment, practice store varchars in dictionary table. why? require more space illustration int4 , having larger , larger table take more space, while each name can stored 1 time in table.

query performance

where in means planner compare schedule_id any'{1,2}' converted integer[] type can notice downwards below.

subqueries

you can not avoid subqueries, if need aggregate data. having in mind, please remember not queries consist of 1 select statement. in reality, (unless have application has it's tiny part connected database, illustration simple game need store info containing users , points)

query

your query plan on given sample data:

select count(type), sum(views) tmp_test8 bring together (select name,count(1) tmp_test8 schedule_id in (1,2) grouping 1 having count(1) = 2) b on a.name = b.name schedule_id = 1; query plan ------------------------------------------------------------------------------ aggregate (cost=23.59..23.60 rows=1 width=8) -> nested loop (cost=11.77..23.59 rows=1 width=8) bring together filter: ((a.name)::text = (tmp_test8.name)::text) -> seq scan on tmp_test8 (cost=0.00..11.75 rows=1 width=524) filter: (schedule_id = 1) -> hashaggregate (cost=11.77..11.79 rows=2 width=516) filter: (count(1) = 2) -> seq scan on tmp_test8 (cost=0.00..11.75 rows=2 width=516) filter: (schedule_id = ('{1,2}'::integer[]))

though, query rewritten without joins , scan table once. suggestion:

select count, sum(view_count) from( select name, count(1) count, sum(case when schedule_id = 1 views end) view_count tmp_test8 schedule_id in (1,2) grouping 1 having count(1) = 2 ) foo grouping 1 query plan ------------------------------------------------------------------------ hashaggregate (cost=11.83..11.85 rows=2 width=16) -> hashaggregate (cost=11.78..11.80 rows=2 width=524) filter: (count(1) = 2) -> seq scan on tmp_test8 (cost=0.00..11.75 rows=2 width=524) filter: (schedule_id = ('{1,2}'::integer[]))

both queries produce same result.

mysql sql subquery query-optimization

No comments:

Post a Comment