number

Find-Duplicates-Automatically



Sharing buttons:

You sometimes need to use Excel to check in a data set where there are duplicate values.

So in this example, what we have got here is a transaction number and you’ll see the

first one is 100123 and as you come down you’ll see that same number is repeated a couple

of times. So what we want to do is to tell Excel to please find where these situations

happen. The function to use is the COUNT IF function. The key logic is how we are going

to do the counting. Just working back we want Excel to look at the first row and to say,

that transaction number - have I seen it before? – just looking upwards. When it goes to

the next cell, it must look between these two and say, have I seen it before, yes or

no – how many times? Eventually we will go all the way down here and by the time we

get to this cell (B18), what it will do is count how many times its seen this transaction

before and in this case it will say, oh, there is another one here. So you are always looking

above. So lets do the “COUNT IF” function. The range is where we are going to make a

major difference. So I’m going to say please look at the range from here and in this case

by pushing semi-colon (;), you’ve got the same range (B11;B11) – and the criteria,

I’m going to make it this cell (B11). What is key is, as I copy it down, I want the first

cell to stay the same (of the range) but the second reference can move. So I’m going

to freeze SBS11 and press ok. If you see Excel’s result – it has found one time of this transaction.

If I copy it down to G31. As we go down, you’ll see the area it looks at is ever increasing

until it gets to row G18 where its now looking and saying in this range, I have found this

transaction twice – and going down you’ll see 3, 4, 5 etc. And now what we want to do

is in a separate column, we want to tell Excel where is the first instance of any particular

transaction number shown. So, for example, this 2, 3, 4 – all of them are irrelevant

to us – we don’t actually care – we know that that means that they are duplicates

– all we care about is the 1’s. So a simple way to do it is use the “IF” function

and just say, if this is a 1, that’s fine – we’ll just put a 1 for example, if its

not a 1 put a 0 or another word. So I’m just going to use the function wizard, go

to the “IF” function- the logical test here is that if this cell is = to 1, then

I know that that is a unique situation, if its true in my case I’m going to put a 1,

you may put a word saying “unique” or whatever you want it to be. If its not a 1,

then it must be a duplicate, so you can either be a word but I am going to put a 0 in, say

ok and now I copy it down. So what you’ll see here is that all unique transaction numbers,

get 1s. Wherever there are duplicates, you now get 0s. If you wanted to now, you could

set up the data filter. You could say just show me the 1s and you will see only the unique

numbers. What we can now do is take it a step further. What we want to do is uniquely identify

the unique items. So what we want to be able to say is- give an absolutely unique number

to the lines that are unique – so for example, that will be a 1, that will be a 2. When it

gets down here where there are duplicates, we don’t actually want numbers for these,

but we want unique numbers here. This will allow you to do things like “VLOOKUP”

on the set of data. The way you can do that is by using an “IF” function. So let’s

activate the function wizard, get the “IF” function. What we want to do now, is every

time we see a 1, in this first instance column, I want to have a unique number so I’ll take

a number and add 1, if there’s not a 1, and there’s a 0, then I want Excel to ignore

it and put a 0. So the logical test is, is this cell = 1. If its true, and again we are

going to use a growing boundary – I want it to sum. So we go from here to itself (I

am just going to freeze that). If its false I want to put a 0. Now this is the key – the

easiest way to show what this is doing is to copy it down – say ok, copy it down.

What you’ll see the first time it finds a 1 so it adds the 1 in if you go down again

– it sees a 1 so it adds these 2 together. If we go down again, it sees a 1 so adds these

3 together. When it gets to the first 0, it sees 0 and we have told it- please make it

a 0. When it finally finds another 1, it then goes and adds all of them together. What you

can now see, is that you have got an absolutely unique number for every unique row of information

and you’ll see it goes up to 13. At this point now it is possible to extract all this

information using something like a VLOOKUP command. What you could do is using a VLOOKUP

command you could set up a separate spreadsheet, have a grouping of numbers – 1,2 3 etc and

all the VLOOKUP will do is it will look on this table, find the unique item called 1

and maybe pull through this set of information. Another problem with duplicate values is that

sometimes you are not looking for a duplicate value at this level, you maybe want it to

be the same transaction number, the date, etc. For this methodology what you can then

do is you need to create a single cell that represents absolute uniqueness. The way you

will see I have done it here is I have used CONCATENATE and I’ve joined all the cells

together in this case - so what I’m going to be looking for is the exact duplicate where

columns A, B, C, D etc are the same. The only way to do this is to create a new column,

create this unique key and then follow through and do the exact same logic here (count how

many times you’ve seen this before).