Tuesday 15 April 2014

subquery - Mysql Query to find near duplicate urls -



subquery - Mysql Query to find near duplicate urls -

i'm trying eliminate duplicate domain names (url's) in mysql db table.

i've used query find "same" urls:

select url, count(*) c links grouping url having c > 1;

but query fails find same domain different urls need:

example.com www.example.com www.example.com/ www.example.com/somepage.htm

any help grateful.

you can handle lastly 3 cases pretty easily:

select min(url), count(*) c links grouping substring_index(url, '/', 1) having c > 1;

to first, recommend removing www. @ origin of string. next should work (although fail if .www occurs later in url before first /):

select min(url), count(*) c links grouping (case when url 'www.%' substring(substring_index(url, '/', 1), 5) else substring_index(url, '/', 1) end) having c > 1;

mysql subquery

No comments:

Post a Comment