“We should all be proud of adding Data Stewardship to your LinkedIn profile, or even better; having a friend endorsing our data stewardship, – wouldn’t that be cool?”

DataStewardsAlthough the origin of the Steward originates from the house keepers and servants of old Scottish castles, the 21st century image of data stewards has nothing to do with housekeeping and serving others. Stewardship is about responsibility and accountability, values which in our busy world often is overlooked but which could need a bit of a renaissance especially when we are talking about data and the increasing volumes of date we see on a day to day basis in our businesses.


When talking about data, we should first of all talk about relevancy. Data is relevant if it adds value to your business, to your processes, but data which may be important for one process may not be relevant for someone else.  Let’s take an example; A user who enters a new prospect into a Lead database, enters the basic information captured e.g. on a business card. This information may be sufficient for the user to reach out and work the new lead. We all know that unfortunately not all leads are converted into clients, so many of these leads are left behind, lost!

As the data now is captured in the database it still lives on, and other functions in the business may later on use this data for e.g. future campaigns, lost-lead analysis, lead-conversion cost analysis etc. but the problem is that if the information used for marketing targeting/segmenting, data source and origin etc. is not captured the usage and the value of the data is going to be limited. It cannot be used in targeting campaigns; it will not be included in the analysis etc.

So although data entered fulfills the main and initial purpose, the data must be entered with completeness and accuracy in mind.  At data capture and at each point, regardless of whether it is user input, data imports, or registrations on a website, you should assure that the data fields has clear and unambiguous definitions, document it’s origin and source and give you traceability. Only this way will you be able to prioritize your data maintenance efforts, which records to remove, which to enrich, which to use as targets etc.


Who is responsible for this?  Somehow everyone is responsible, every user, but in many cases it falls back on someone, a single person who manages the tools which perform the manipulations on the data. Each end user only sees the top if the iceberg, only sees their part and not the full picture, and when there is an issue it must be somewhere in the part which is under water, out of sight. At least that is the preferred impression, and a safe way to push the responsibility onwards to someone else, yes we are all humans and we do this all the time….

Why not be open about this, share, engage and take responsibility. We all use the data every day and the data becomes more and more important for us to do our work successfully to meet our objectives, targets, and numbers.  The parachutist actually does prepare his own parachute before the jump, why don’t we care the same way about the data which is the knowledge we depend upon to do our work?

A data steward is not a person, a function in your org. or someone whom you can point fingers at when your report doesn’t show the expected values, it’s a skill which all users should have and develop, something we all should be proud of adding as a skill on our LinkedIn profile. Show that you care!

Have a relationship with your data!

Collaborate, and take initiative and ownership. These are sweet words in a busy world but not unreachable. The problem is that most data management activities are focused on tools which are not collaborative, which requires special training, which uses technical terminology, so how can everyone be on the team if there is only place for the experts, the pros?

If everyone needs to participate and do this more from the motivation of the carrot than the whip, the data management solutions must be collaborative, include functionality where everyone can participate, do their small part, contribute and feel that they own the data, – that the data is being taken care of and is there to help them fulfill their objectives.

Example: Duplicates

Duplicates is a problem in any database and in lead and customer databases in particular this causes an enormous amount of trouble from bad communication to prospects and clients, over wasted efforts in marketing and sales, to user frustration – lack of confidence in the CRM system, to direct financial impact on double counting in finance etc.

Duplicates can be detected and treated in many different ways all depending on how they appear. If you have worked with this topic in real-life you will know that every source of entry will introduce duplicates no matter how tight you make the shutters, so unless you disable record creation you must deal with duplicates, face it!

When e.g. a user enters a new lead the user should be accountable for the accuracy and completeness of the data being entered. A lead without an address can never be included in a mail-shot campaign; a lead without a title/role can never be segmented as e.g. senior-level management for a red-carpet mail shot etc.

But not only is the completeness important for future usage of the lead; when checking if the new records already exist; is a duplicate – the completeness and accuracy are important elements for the matching of records – to find and present potential duplicates. Most basic deduplication solutions base their search on e.g. first name, last name and email, and so if one of these elements are missing, incomplete (an initial only), misspelled (Mike vs. Michael) etc. the search will not return any matching records and the duplicate is introduced and the damage is done. With completeness being respected, and more sophisticated matching solutions implemented, the matching records will have a higher probability of being found as phone, address, and other elements can assist the matching process in finding and presenting potential duplicates based on these additional elements, and not just the email.

By adding this check at the user entry level, the user will get the experience that the system helps him to avoid creating duplicates; it engages the user in keeping the data clean and gives the impression that his/her contribution is valuable and responsible.

Similar when detecting duplicates in an existing database (existing data or imports etc.) the list of potential duplicates may be long, and some are obvious duplicates which can be merged directly, but others are more questionable and who else is better to determine which record is the best than the account manager who works his accounts on a day to day basis. Not to forget the field level merging when the duplicate records have different email, phone etc. Will your dedupe-expert-IT-guy know which one is the one to keep?

Combining the ability for the users to contribute, without extra effort, to reduce the number of duplicates, and at the same time have the possibility to be presented with potential duplicates, which, if you look at it from the positive side, may add new data to your existing record (a missing phone number, a job title, an alternative email etc.). So when you are merging records you are not just fixing a problem you are actually enriching your records and consolidating the knowledge you need for you to do your work.


With the example of deduplication we have illustrated that data stewardship can be a skill we all can develop and grow that it is not for the few selected ones.

“Robert S. Seiner: Being a Data Steward describes a relationship between a person and some data”


Using data and maintaining data is not two separate functions. With the availability of true collaborative solutions each user can take ownership, can take responsibility and engage in building valuable relations between the users and the data.

As with all other relationships the relation between user and data needs care, to be nurtured, and be given the attention it requires.

“Turn the page! Stop thinking: tools, and who does it, start thinking: we share and I want to contribute”


Take responsibility for your data, make it relevant and care about it, -then Go add “Data Stewardship” to your LinkedIn profile (http://www.linkedin.com/profile) today. You will feel better, trust me!


[Robert S. Seiner: Seiner’s Rules for Becoming a Data Steward: http://www.tdan.com/view-articles/16867]

[Wikipedia on Data Steward: http://en.wikipedia.org/wiki/Data_steward]

December, 2013

Author: Peter Larsen, CEO, DataTrim ApS (www.datatrim.com)Data Quality Management Expert and founder of DataTrim – a Professional Data Laundry

Sharing is caring!