database - Looking to find duplicates in large dataset using indexes in SQL -
i have database contains users , emails. big dataset, i'm looking faster method simple select statement. want find users have multiple email addresses listed. believe have start this:
create index ix_mydatabase_emails on mydatabase (email asc)
but honest i'm new indexing , sql rusty, not quite sure after that.
if want count email addresses, aggregation fastest way. if want start spitting out emails have duplicates, in many databases, next faster:
select uet.user user_email_table uet exists (select 1 user_email_table uet2 uet2.user = uet.user , uet2.email <> uet2.email );
for performance, want index on user_email_table(user, email)
.
this homecoming duplicates. select distinct
add together processing time.
by "multiple" email addresses, assuming want different email addresses. difference between these 2 queries:
select user, count(*) user_email_table grouping user having count(*) > 1;
and:
select user, count(distinct email) user_email_table grouping user having count(distinct email) > 1;
sql database indexing
No comments:
Post a Comment