subquery - Mysql Query to find near duplicate urls -
i'm trying eliminate duplicate domain names (url's) in mysql db table.
i've used query find "same" urls:
select url, count(*) c links grouping url having c > 1;
but query fails find same domain different urls need:
example.com www.example.com www.example.com/ www.example.com/somepage.htm
any help grateful.
you can handle lastly 3 cases pretty easily:
select min(url), count(*) c links grouping substring_index(url, '/', 1) having c > 1;
to first, recommend removing www.
@ origin of string. next should work (although fail if .www
occurs later in url before first /
):
select min(url), count(*) c links grouping (case when url 'www.%' substring(substring_index(url, '/', 1), 5) else substring_index(url, '/', 1) end) having c > 1;
mysql subquery
No comments:
Post a Comment