Breedlove: mysql - How to optmize query with subqueries,WHERE IN and varchar comparison fields? -

Sunday, 15 March 2015

mysql - How to optmize query with subqueries,WHERE IN and varchar comparison fields? -

i working on scraping project crawl items , view counts on different schedules.schedule user defined period (date) when script intended run.

table construction follows:

create table if not exists `stats` (   `id` int(11) not null auto_increment,   `schedule_id` smallint(11) not null,   `type` smallint(11) not null,   `name` varchar(250) collate utf8_unicode_ci not null,   `views` int(11) not null,   `updated_time` timestamp not null default current_timestamp on update current_timestamp,   primary key (`id`) ) engine=myisam  default charset=utf8 collate=utf8_unicode_ci ;

all info stored in table stats , later analyzed see type wise growth in views.

the info like:

sample set

the scraping done on periods , each schedule expected have 20k entries.the schedules made in daily or weekly basis,hence info grow be around 2-3 1000000 in 5-6 months.

over info need perform queries aggregate same name come across selected range of schedules.

for example:

i need aggregate same items(name) come across multiple schedules. if schedule 1 , 2 selected,items come under both of schedules selected.so here itema , itemb.

the type-wise sum of views should calculated here.

hence schedule 1:(updated)

select count(t.`type`) count, sum(t.views) view_count  `stats` t  inner  bring together  (     select name,count(name) c `stats` `schedule_id` in (1,2)  grouping name having c=2 ) t2 on t2.`name` = t.`name`  `schedule_id`=2  grouping type

this expected result.

but have read using sub queries,where in, varchar comparing fields won't help in having optimized query.how optimized improve performance.

the rules same type aggregator follows:

1.under schedule id, there same names different type value.combination of schedule_id,name , type won't duplicated.

2.type wise aggregator -which sums values under each type made.

i doing project in python -mysql scraping purpose , php listing results.i know how organize table query improve performance. please advice.

varchar column

as said in comment, practice store varchars in dictionary table. why? require more space illustration int4 , having larger , larger table take more space, while each name can stored 1 time in table.

query performance

where in means planner compare schedule_id any'{1,2}' converted integer[] type can notice downwards below.

subqueries

you can not avoid subqueries, if need aggregate data. having in mind, please remember not queries consist of 1 select statement. in reality, (unless have application has it's tiny part connected database, illustration simple game need store info containing users , points)

query

your query plan on given sample data:

select count(type), sum(views) tmp_test8  bring together (select name,count(1) tmp_test8 schedule_id in (1,2)  grouping 1 having count(1) = 2) b on a.name = b.name schedule_id = 1;                                    query plan                                   ------------------------------------------------------------------------------  aggregate  (cost=23.59..23.60 rows=1 width=8)    ->  nested loop  (cost=11.77..23.59 rows=1 width=8)           bring together filter: ((a.name)::text = (tmp_test8.name)::text)          ->  seq scan on tmp_test8  (cost=0.00..11.75 rows=1 width=524)                filter: (schedule_id = 1)          ->  hashaggregate  (cost=11.77..11.79 rows=2 width=516)                filter: (count(1) = 2)                ->  seq scan on tmp_test8  (cost=0.00..11.75 rows=2 width=516)                      filter: (schedule_id = ('{1,2}'::integer[]))

though, query rewritten without joins , scan table once. suggestion:

select count, sum(view_count)  from(      select name, count(1) count, sum(case when schedule_id = 1 views end) view_count      tmp_test8      schedule_id in (1,2)       grouping 1      having count(1) = 2      ) foo   grouping 1                                query plan                                ------------------------------------------------------------------------  hashaggregate  (cost=11.83..11.85 rows=2 width=16)    ->  hashaggregate  (cost=11.78..11.80 rows=2 width=524)          filter: (count(1) = 2)          ->  seq scan on tmp_test8  (cost=0.00..11.75 rows=2 width=524)                filter: (schedule_id = ('{1,2}'::integer[]))

both queries produce same result.

mysql sql subquery query-optimization

Breedlove

Sunday, 15 March 2015

mysql - How to optmize query with subqueries,WHERE IN and varchar comparison fields? -

No comments:

Post a Comment