Big deduplications

July 8th, 2010 DataTrim No comments

We have often questions with regards to setting up and using the filter function in DataTrim Dupe Alerts, and we do admit that the flexibility we provided by allowing full SOQL support sometimes makes the learning curve a bit steep.
In reality it is quite simple as ling as you understand a few basic principles.
In the articles below we have collected insight and examples from real use cases.
Hopefully this will be usefull for you to get more out of your solution

How to eat an Elephant

Missing the overview on how to get started with a big deduplication?

This article takes a top down approach and introduces you to best practices applied by our clients.

Using Filters -Part 1

Read about how to use filter with DataTrim Dupe Alerts to break your deduplication into smaller managerable datasets.

Using Filters -Part 2

Coming soon

Thank you for using DataTrim Dupe Alerts

How to eat an Elephant – Big deduplications!

June 23rd, 2010 DataTrim No comments

One bite at a time

When encountering large challenges you cannot always do as you always have done. You cannot cook the elephant like a turkey for thanksgiving, the oven isn’t big enough. As Barbecue? -Well you will need a very big grill or fireplace, and while some part of the meat might be tender and tasty other part will be burned or still raw.

The same goes for big deduplications.

The analogy from eating an elephant to performing data operations on large database is obvious.

If you try to run deduplications on very large deduplications you risk to only get the work done poorly and have to wait a very long time to before it is “well done”.

Chunking it up

So how do we chunk it up? As for parting up an animal for cooking there are certain traditions and preferences which may of cause vary across different cultures and different animals, so if you think there is only one way, think again!

Your database and the data it contains, may be very specific to your business and so the way to chunk up your data might have to be unique to you, but here are a few examples of what others have done.

By Data Quality

Deduping a large database isn’t just about cleaning up the database, but also about the amount of time you put into cleaning it up, so one approach is to try to categorize the data into different levels of data quality and focus your deduplication on the high quality data first and then work you way down.
There are many ways of identifying the quality of a record. Everything from looking at how well the record is populated across certain key field, to classifications by source, interest or how old the records are (no activity/reply to campaigns over long period) etc, will do.

Once you have your way of categorizing by quality the deduplication can begin and you can concentrate on making the good part better, and eventually throw away a serious amount of bad data.

By Type, Ownership or Territory.

Like for the data quality you can categorize your records by type, ownership of the record or territory and use this to perform deduplications on smaller subsets of data.

Not only is the use of such categorization a simple way to divide your database, to use the record type, ownership or territory may in many cases be a mandatory first approach, as the merging (processing the result of the deduplication) of records from different categories can be quite complex and have influence on the business processes, account and territory management models and links into back-office systems as well.

By Likeliness

All though you may break your database into categories as mentioned above, one day you might want to match records from different categories against each other and you will then still end up with a very large set of data. In this context you would need to make chunks within which you are more likely to find duplicates than in others and so you might need to understand a little more about the way your deduplication solution is working.

First of all you will of cause need to consider how you will define dupes across different categories, making sure that the relevant exist on the records you include in the deduplication i.e if using the email in your process for identifying duplicates you might as well exclude records without emails.

By Sound-A-Like

If you break the database into smaller chunks using the name of the contacts, you can of cause use e.g. the first character of the last name to create filters, including a, b and c in one chunk d, e, and f in the next etc.
A better approach is to consider grouping the names into logical groups where the first character may sound similar like: c and s. If your database mostly contain english names, you might consider grouping by soundex.
We have successfully use the following grouping:A+H, B+F+P+V, C+S+Z, D+K+Q, G+I+J, L+X, M+N, U+W+Y+E+R

With DataTrim Dupe Alerts


In our next article of this series (July 2010) you will read more about setting up the alerts using filters in DataTrim Dupe Alerts

Merging large volume of duplicates?

May 25th, 2010 DataTrim No comments

New Batch Merge functionality

In the May ’10 release of our solution we have added new Merging functionality to our DataTrim Dupe Alert Solution.

The new Mass Merge functionality will allow you to merge up to 200 dupes in a single batch operation without having to wait for the online merging to complete.

To merge using Batch, simply select the Batch Option on your Mass Merge screen.
With the additional settings you can be notified when the Batch is processed. This notification will also include information regarding the result of the merge.

Batch Merge Settings

Note the Batch Merge will automatically be imposed if you have selected more records than what your current Mass Merge Batch Size is set to (90).

To install the new release, simply click on this link: Install into Production
To install into a sandbox use this link: Install into sandbox

If you upgrade from a previous version you will have to switch the Mass Merge button to the new one, and add the Enabled Batch Merge check box to your DataTrim Dupe Alert Parameters page layout (see below).
Go to Setup->App Setup->Create->Objects and select the DataTrim Matched Record Object. Under the section Search Layout, edit the DataTrim Matched Records List View, and replace the ‘Mass Merge’ Button with the new ‘Mass Merge (ß)’ button.

Switching the Mass Merge Buttons

Finally you should enable the functionality by editing your DataTrim Parameters/Changing your Defaults.
Add the Check Box to the Page Layout: see more.

Once the functionality is enabled, you should see the Batch Merge options next time you perform a Mass Merge.

Note:

The functionality is currently available as an option (preview), but we still encourage our clients to install the latest version and evaluate the new functionality, although we know that this new functionality may not be relevant explicitly to you.

Based on user feedback we expect to make the functionality generally available in June ’10.

Google Analytics integration offered by Wordpress Google Analytics Plugin