sql - How to count columns where values differ? -
i have large table , need check similar rows. don't need column values same, similar. rows must not "distant" (determined query on other table), no value may different (i have done queries these conditions) , other values must same. must expect ambiguity, 1 or 2 different values shouldn't break "similarity" (well, better performance accepting "completely equal" rows, simplification cause errors; option).
the way going solve through pl/pgsql: make loop iterating through results of previous queries. each column, have if testing whether differs; if yes, increment difference counter , go on. @ end of each loop, compare value threshold , see if should keep row "similar" or not.
such pl/pgsql-heavy approach seems slow in comparison pure sql query, or sql query pl/pgsql functions involved. easy test rows x equivalent rows if knew rows should different, difference can occur @ of 40 rows. is there way how solve single query? if not, there faster way examine rows?
edit: mentioned table, in fact group of 6 tables linked 1:1 relationship. don't feel explaining what, that's a different question. extrapolating doing on 1 table situation easy me. simplified (but not oversimplified - should demonstrate difficulties have there) , made example demonstrating need. null , else should count "different". no need make script testing - need find out whether possible in way more efficient thought about.
the point don't need count rows (as usual), but columns.
edit2: previous fiddle - wasn't short, let here archiving reasons.
edit3: simplified example here - not null integers, preprocessing omitted. current state of data:
select * foo; id | bar1 | bar2 | bar3 | bar4 | bar5 ----+------+------+------+------+------ 1 | 4 | 2 | 3 | 4 | 11 2 | 4 | 2 | 4 | 3 | 11 3 | 6 | 3 | 3 | 5 | 13
when run select similar_records( 1 );
, should row 2 (2 columns different values; within limit), not 3 (4 different values - outside limit of 2 differences @ most).
to find rows differ on given maximum number of columns:
with cte ( select id ,unnest(array['bar1', 'bar2', 'bar3', 'bar4', 'bar5']) col -- more ,unnest(array[bar1::text, bar2::text, bar3::text , bar4::text, bar5::text]) val -- more foo ) select b.id, count(a.val <> b.val or null) cols_different (select * cte id = 1) join (select * cte id <> 1) b using (col) group b.id having count(a.val <> b.val or null) < 3 -- max. diffs allowed order 2;
i ignored other distracting details in question.
demonstrating 5 columns. add more required.
if columns can null
may want use is distinct from
instead of <>
.
this using unorthodox, handy parallel unnest()
. both arrays must have same number of elements work. details:
sql fiddle (building on yours).
Comments
Post a Comment