subquery - Mysql Query to find near duplicate urls -
i'm trying eliminate duplicate domain names (url's) in mysql db table.
i've used query find "same" urls:
select url, count(*) c links grouping url having c > 1; but query fails find same domain different urls need:
example.com www.example.com www.example.com/ www.example.com/somepage.htm any help grateful.
you can handle lastly 3 cases pretty easily:
select min(url), count(*) c links grouping substring_index(url, '/', 1) having c > 1; to first, recommend removing www. @ origin of string. next should work (although fail if .www occurs later in url before first /):
select min(url), count(*) c links grouping (case when url 'www.%' substring(substring_index(url, '/', 1), 5) else substring_index(url, '/', 1) end) having c > 1; mysql subquery
No comments:
Post a Comment