SQL Server JOINs executing very slowly with large tables -
below have query takes e-mail 1 table, , joins 3 other tables match e-mail. filters 2 columns (utm_campaign
, utm_source
) make sure not empty.
two of tables have close million rows, , other 2 around 100,000 rows.
currently, 100 rows outputted takes approximately 60 seconds. i'm expecting between 500,000-1,000,000 rows outputted select
statement, might take 4-5 days complete.
i don't understand why server's processors using 27% of resources, or doing differently joins make faster process. have refined joins could, , increased number of processors on server no avail. i'm not familiar indexing , don't know done of data.
has had experience doing joins on such large tables , identify flaws in logic of query, or maybe come more efficient way of matching rows other tables. please see complete query below reference:
select pu.recip_id, pu.email, pu.date_joined, vp.utm_source vp_source, vp.utm_med vp_medium, vp.utm_camp vp_campaign, vp.created vp_created, sch.utm_source sch_source, sch.utm_med sch_medium, sch.utm_camp sch_campaign, sch.created sch_created, gf.utm_source gf_source, gf.utm_medium gf_medium, gf.utm_campaign gf_campaign, gf.created gf_created [digital].[dbo].[postup_recipients] pu left join [digital].[dbo].[vp_charges] vp on pu.email = '"' + vp.email + '"' left join [digital].[dbo].[stripe_customers] scu on pu.email = '"' + scu.email + '"' left join [digital].[dbo].[stripe_charges] sch on scu.cust_id = sch.cust_id left join [digital].[dbo].[gform_entries] gf on pu.email = '"' + gf.email + '"' ( gf.utm_source not null , gf.utm_source != '' , gf.utm_campaign not null , gf.utm_campaign != '') or ( vp.utm_source not null , vp.utm_source != '' , vp.utm_camp not null , vp.utm_camp != '') or ( sch.utm_source not null , sch.utm_source != '' , sch.utm_camp not null , sch.utm_camp != '')
create index on vp.email, scu.email, sch.cust_id, , gf.email.
reverse join logic on 3 joins you're calculating, e.g. pu.email = '"' + vp.email + '"' => vp.email = substring(pu.email, 2, len(pu.email) - 2).
your filters might able played with, gets little tricky. think vp.utm_source not null , vp.utm_source != '' => vp.utm_source > '', , can create index on vp.utm_source, used if there few rows populated. add secondary column index on vp.email. think part lesser of problems. joins above biggest issues.
Comments
Post a Comment