
01-20-2005, 07:40 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2004
Posts: 73
|
|
Data sorting / duplication removal
Hello,
A client has a list of their customers, in a spreadsheet. As this list came from 3 different databases, and is to be imported into 1 new system, there is a fair bit of duplication.
Is there somebody that offers a service where they take the spreadsheet, and sort through it removing duplications?
|

01-20-2005, 07:50 PM
|
|
Predatory Poster
|
|
Join Date: Jul 2003
Location: Goleta, CA
Posts: 5,550
|
|
Yeah this is a common need in business. Try googling someone that will do it. Or put in a request over at employment offer sin this forum.
__________________
Patron: I'd like my free lunch please.
Cafe Manager: Free lunch? Did you read the fine print stating it was an April Fool's joke.
Patron: I read the same way I listen, I ignore the parts I don't agree with. I'm suing you for false advertising.
Cafe Owner: Is our lawyer still working pro bono?
|

01-20-2005, 07:55 PM
|
|
Web Hosting Master
|
|
Join Date: Dec 2003
Location: Miami, FL
Posts: 3,262
|
|
why not import the three spreadsheets into 1, then sort them by the command term (ie. tel #) and then delete the dups?
|

01-20-2005, 08:08 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2004
Posts: 73
|
|
Quote:
Originally posted by Rob83
why not import the three spreadsheets into 1, then sort them by the command term (ie. tel #) and then delete the dups?
|
There are 30,000 records.
|

01-20-2005, 08:11 PM
|
|
Web Hosting Master
|
|
Join Date: Dec 2003
Location: Miami, FL
Posts: 3,262
|
|
Quote:
Originally posted by fang666
There are 30,000 records.
|
You can easily run a macro that will search through the 30,000 records for duplicates by analyzing specific rows and columns.
Quote:
Sub Find_Duplicates()
Dim CompareRange As Variant
Dim Selection As Variant
Dim x As Variant
Dim y As Variant
Set Selection = Range("B2:B6")
Set CompareRange = Range("C2:C6")
For Each x In Selection
For Each y In CompareRange
If x = y Then x.Offset(0, 2) = x
Next y
Next x
End Sub
|
|

01-20-2005, 08:22 PM
|
|
Junior Guru Wannabe
|
|
Join Date: Sep 2004
Posts: 73
|
|
Rob - thanks for the informative reply.
The issue is, that there could be a few customers with the same name. So in the event of a duplicated name, we need to check the address to see if its the same.
Then of course, if the address is the same but entered slightly differently......
|

01-20-2005, 08:36 PM
|
|
Web Hosting Master
|
|
Join Date: Aug 2000
Location: NYC
Posts: 6,627
|
|
Yep, eliminating duplicates in addresses isn't that straightforward. The way it's generally done is to first "standardize" the address by formatting them to USPS specifications. There is software available to do that. Ideally it will check that the address range exists, and identify any addresses that are not possible. You can then find exact duplicates with that output, or take a further step of tokenizing the addresses and matching that version; or more sophisticated dupe software will scan through the addresses and give each a score that measures how closely it resembles other addresses. Some will be clearly atches, some clearly not, and some in a gray area that may or may not be. You can either treat all of those one way or the other, or manually evaluate them.
Near matches on names can be handled generally the same way... in fact the level of match between the names associated with each address can be part of the criteria for weighing whether two records are duplicates or not.
If you do a search and find a commercial service to do it, they should take an approach along those lines.
__________________
Specializing in SEO and PPC management.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
| Postbit Selector |
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|
|
| Login: |
|
|
| Advertisement: |
|
|
| Web Hosting News: |
|
|
|