mysql - How to optmize query with subqueries,WHERE IN and varchar comparison fields? -
i working on scraping project crawl items , view counts on different schedules.schedule user defined period (date) when script intended run.
table construction follows:
create table if not exists `stats` ( `id` int(11) not null auto_increment, `schedule_id` smallint(11) not null, `type` smallint(11) not null, `name` varchar(250) collate utf8_unicode_ci not null, `views` int(11) not null, `updated_time` timestamp not null default current_timestamp on update current_timestamp, primary key (`id`) ) engine=myisam default charset=utf8 collate=utf8_unicode_ci ;
all info stored in table stats , later analyzed see type wise growth in views.
the info like:
sample set
the scraping done on periods , each schedule expected have 20k entries.the schedules made in daily or weekly basis,hence info grow be around 2-3 1000000 in 5-6 months.
over info need perform queries aggregate same name come across selected range of schedules.
for example:
i need aggregate same items(name) come across multiple schedules. if schedule 1 , 2 selected,items come under both of schedules selected.so here itema , itemb.
the type-wise sum of views should calculated here.
hence schedule 1:(updated)
select count(t.`type`) count, sum(t.views) view_count `stats` t inner bring together ( select name,count(name) c `stats` `schedule_id` in (1,2) grouping name having c=2 ) t2 on t2.`name` = t.`name` `schedule_id`=2 grouping type
this expected result.
but have read using sub queries,where in, varchar comparing fields won't help in having optimized query.how optimized improve performance.
the rules same type aggregator follows:
1.under schedule id, there same names different type value.combination of schedule_id,name , type won't duplicated.
2.type wise aggregator -which sums values under each type made.
i doing project in python -mysql scraping purpose , php listing results.i know how organize table query improve performance. please advice.
varchar column
as said in comment, practice store varchars in dictionary table. why? require more space illustration int4 , having larger , larger table take more space, while each name can stored 1 time in table.
query performance
where in
means planner compare schedule_id
any'{1,2}'
converted integer[]
type can notice downwards below.
subqueries
you can not avoid subqueries, if need aggregate data. having in mind, please remember not queries consist of 1 select
statement. in reality, (unless have application has it's tiny part connected database, illustration simple game need store info containing users , points)
query
your query plan on given sample data:
select count(type), sum(views) tmp_test8 bring together (select name,count(1) tmp_test8 schedule_id in (1,2) grouping 1 having count(1) = 2) b on a.name = b.name schedule_id = 1; query plan ------------------------------------------------------------------------------ aggregate (cost=23.59..23.60 rows=1 width=8) -> nested loop (cost=11.77..23.59 rows=1 width=8) bring together filter: ((a.name)::text = (tmp_test8.name)::text) -> seq scan on tmp_test8 (cost=0.00..11.75 rows=1 width=524) filter: (schedule_id = 1) -> hashaggregate (cost=11.77..11.79 rows=2 width=516) filter: (count(1) = 2) -> seq scan on tmp_test8 (cost=0.00..11.75 rows=2 width=516) filter: (schedule_id = ('{1,2}'::integer[]))
though, query rewritten without joins , scan table once. suggestion:
select count, sum(view_count) from( select name, count(1) count, sum(case when schedule_id = 1 views end) view_count tmp_test8 schedule_id in (1,2) grouping 1 having count(1) = 2 ) foo grouping 1 query plan ------------------------------------------------------------------------ hashaggregate (cost=11.83..11.85 rows=2 width=16) -> hashaggregate (cost=11.78..11.80 rows=2 width=524) filter: (count(1) = 2) -> seq scan on tmp_test8 (cost=0.00..11.75 rows=2 width=524) filter: (schedule_id = ('{1,2}'::integer[]))
both queries produce same result.
mysql sql subquery query-optimization
No comments:
Post a Comment