January 12, 2023

Transform Data Chaos and "bad data" into Ordered Insight

Event Date:
Hosted By:
Register Now
Tom Ng

Data Disarray. Is it the New Normal?

Data has been, and continues to grow at an ultra-rapid pace. As the amount of data stored by organizations expands year after year, new issues are developing in terms of data retention, security, and data storage governance. These data challenges are unavoidable for virtually all businesses that collect or generate data.

Image Source: Data Sentinel

More often than desired, companies encounter problems such as:

  • Which files belong to whom?
  • Are there any old files on your systems that hold sensitive data?
  • How many copies of a file are there, and where are they?
  • Which files are no longer in use?

To ensure that business, regulatory, and data security requirements are properly met, it is necessary to promptly address these challenges. However, the large volume of data that businesses accumulate, can make it difficult and time-consuming to meet these objectives, especially when a significant portion of the data is deemed "bad data".

So,What Is Bad Data?

There are 4 common types of bad data.

Incorrect Data

According to the Global Data Management Benchmark Report published by Experian, around 23% of customer data is believed to be incorrect. In fact, a survey conducted the previous year found that 92% of organizations experienced issues due to inaccurate data within the past 12 months.

When businesses collect and analyze data to gain insight into their industry or customer base, the accuracy of this data is crucial. If the data contains incorrect or inaccurate information, it defeats the purpose of its collection and analysis.

Incomplete Data

A study by Dun & Bradstreet found that 1 in 5 businesses lose revenue and customers due to incomplete data.

Incomplete or empty fields in data are like missing puzzle pieces, preventing a complete picture from being formed.

For example, if a marketing campaign targets prospects in executive positions but not everyone on the list includes their job title, it could lead to missed opportunities or inefficient segmentation of potential consumers.

Image Source: Columbia Pictures

Duplicate Data

Contrasted with incomplete data, too much of a good thing causes problems of its own. Holding a variety of similar duplicates requires the answer of: which one is the right one?

A study published to The American Health Information Management Association found that duplicate records in hospitals often make up 5-10% of all stored records. Hospitals with multiple locations or which have merged with other systems may have duplicate rates as high as 20%.

When duplicate records, or worse near-duplicates, are present in the ER, conflicting data can lead to poor patient care and incorrect treatment. If it involves a type of medication or treatment for a patient that produces an adverse reaction, the right copy of the medical could be the difference of a life-or-death situation.

Irrelevant & ROT Data

The last but probably the most obvious bad data is ROT. ROT stands for Redundant, Obsolete, or Trivial data. In other words, it's data that's either no longer relevant (if it ever was) or has little to no value to the company that's keeping it.

According to a study by ManageEngine, at least 30% of an organization's unstructured data may be ROT,with some estimates pushing it much higher following the pandemic.  

Think about the last time you were searching for a document across your email, network drives, SharePoint, and SAP environments, how many ROT do you have to scan and skip through before you found the document that was truly important and the one you really need?

When an organization contains an excess of ROT data, productivity and desired outcomes will diminish.

What makes bad data ~so bad~?

Productivity Loss

When employees squander their time searching for accurate data amongst the clutter, productivity is negatively impacted both directly and indirectly, a waterfall effect of productivity loss may ensue.

According to a 2012 McKinsey & Co report, employees spend 1.8 hours every day (9.3 hours per week) on average searching and gathering information that already exists. To put it another way, those businesses hired 5 employees but only 4 of them showed up to work. The 5th one was off searching for answers, but not contributing any value.

Image Source: Unsplash.com(@olloweb)

Risk Assessment & Discovery Exercise

During risk assessment and discovery exercises (e.g. Data Subject Access Request, Freedom of Information, Access to Information and Privacy), rapid information retrieval is critical. The identification of relevant and regulated data can be significantly bogged down by ROT data, or worse, misled by incomplete data.

In a European based survey conducted by BearingPoint, 500 companies in Germany, Austria, and Switzerland were asked about their ability to provide complete correspondence (including all emails) for a given transaction within 2 weeks. Only 23% of the respondents said they could meet this requirement. This means that more than 75% of these companies either could not or were not confident in their ability to efficiently provide the requested correspondence within the given time frame. As more data and records are continuously being created, this task is becoming increasingly difficult for companies ill-equipped for the task.

Data Retention Risks

The SEC discovered that JP Morgan employees, including managers and senior compliance personnel, used personal devices for communication about company business, including texts, WhatsApp messages, and emails, from at least January 2018 to November 2020. These personal devices were not properly protected, archived, or subjected to retention or Data Loss Prevention policies, meaning that sensitive client information was at risk. In late 2021, JP Morgan admitted to bookkeeping failures and the use of WhatsApp to evade regulatory oversight, and agreed to pay $200 million in fines to settle the charges.

Data Protection & Security Risks

Since the implementation of the GDPR in 2018, various data protection laws have been introduced around the world, such as PIPEDA in Canada, CCPA/CPRA in California, with new laws set to take effect in Virginia,Utah, Connecticut, and Colorado in 2023, all set to further restrict data collection practices and protect consumer data. These privacy regulations introduced to protect consumers, present challenges for businesses that collect data and non-compliance can result in steep fines.

Recently, the SEC imposed a $35 million fine on Morgan Stanley for failing to protect customer data. This occurred because for 5 years, Morgan Stanley hired an incompetent company to dispose of old devices containing 15 million customer records. Some of these devices ended up on online auction sites because they were handled negligently.

Misinformation, Poor Decision Making, & Missed Opportunities

In addition to the potential for fines, bad data can lead to mistakes, inconveniences, bad decisions, and missed opportunities.

An example of this is the Mars Climate Orbiter, a NASA space probe launched in 1998 to study the Martian climate. The probe was expected to generate significant scientific breakthroughs, but it ultimately disintegrated in Mar’s atmosphere. An investigation attributed the failure to a measurement mismatch between two software systems: metric units by NASA and US customary units by spacecraft builder Lockheed Martin.

This bad data error resulted in the loss of the probe, a financial loss of $193 million, and immeasurable missed opportunity for scientific advancement. When data is inaccurate, it leads to ineffective plans and the potential to miss out on prospects and opportunities.

So, How Do We Break Out of These Bad Data?

Depending on the unique requirements and the types of bad data your organization has, the approach to manage your bad data may vary. Nonetheless, these are a few tips that could be beneficial for your data management strategy:

Data Audit & Discovery

Like GIJoe preached for years, “Knowing is half the battle”.

Image Source: Ideatovalue.com

Conducting a data audit and discovery process can help you understand the types and locations of data you have in your ecosystem, including both structured and unstructured data across various systems such as cloud, on-premises, SaaS, partner systems,and legacy environments.

The purpose of this exercise is to uncover what kind of data you have, and where they are. Similar to going through all your possessions before moving, you may uncover previously unknown or forgotten data, as well as any unexpected findings that have been overlooked for a long period of time. The goal is to have a clear understanding of the data you have and where it is stored.

Hopefully, you will not be too surprised with what you would find.

Inventory & Classify Content

Technically speaking, once we know all your 1s and 0s, you can then classify them in a way that you can easily understand and organize them in a data inventory.

The act of inventorying creates a cohesive data dictionary. This results in a 1-stop-shop for you to know the who, what, where, when, and why of all our data elements, and allow you to evaluate your data against your data governance and compliance policies.

ROT Analysis

As we have discussed at length how ROT data could slow you down, cost you more to keep them, and most importantly, put you up at the kind of compliance risks that you don’t want to be involved in.

A ROT analysis with the goal of streamlining your data footprint, improving your risk profile and system’s efficiency is an obvious exercise to combat bad data.

Deleting data can be scary. But with proper policies and rules set up, and an automated system that allow you to double and triple check, you can feel confident that any purged data meets your requirements for disposal.

Compliance & Risk Mitigation

We have already talked about how bad data (or badly managed data) could lead to unnecessary fines and publicity. For any business that uses and collects private data from consumers, doing a Privacy Impact Assessment (PIA) is almost always a necessity to remain compliant.

Even if it is not mandated by regulatory authorities, completing a privacy impact study has a number of organizational benefits:

  • PIA can serve as an early warning system or a method of detecting privacy issues. As a result, organizations may put in place protections before a breach or cybersecurity attack, rather than afterwards.
  • PIAs can bring sooner rather than later resolutions to privacy issues. It is possible for businesses to avoid making costly or embarrassing privacy blunders.
  • PIAs can also shows that an organization made an effort to avoid privacy risks, protecting themselves from unfavourable court judgments, unwanted publicity, and damage to reputation.

Data Metrics & Reporting

Hopefully by this point, your data is nice and clean. Now it’s time to use and leverage it to provide you the insights needed to make those decisions. Isn’t that the whole point for doing all these clean-up work?

Now, make sure you also invest in tools that can connect and present all your data sources on a single dashboard that can provide real-time visibility to your data. Businesses that can have the ability to sort through their data and discover otherwise unused, yet valuable, information, will have an advantage over their competitors and be ahead of the game.

Active Data Governance

As new attack methods are discovered and new protection laws are rolling out, we need to keep our processes and tools up to date to stay safe and compliant. 

Actively monitoring your systems is crucial for continued confidence in your records and data compliance governance policies and security protocols. Data management is an ongoing priority and if done manually, it would be incredibly expensive.

So, you would want to look for ways to automate it so that all you need to do is to input directions and rules, and letting the system handle the rest, 24/7.

Don’t forget, hackers don’t do much sleeping, if at all.

Bad Data Is Like Bad Weather

Let’s face it, it’s bound to happen. We just need to manage them with the strategies and tools that we have, just like our waterproof rain gear.

So, with that, we can say “there is no such thing as bad data, only badly managed data

Is proper data management important to you and your company? Yes? Glad to hear it. Contact us to learn why we're the best data nerds in the business.

Sign up to be notified about future Publications!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
January 12, 2023

Transform Data Chaos and "bad data" into Ordered Insight

Date:
Hosted By:
Register Now

Data Disarray. Is it the New Normal?

Data has been, and continues to grow at an ultra-rapid pace. As the amount of data stored by organizations expands year after year, new issues are developing in terms of data retention, security, and data storage governance. These data challenges are unavoidable for virtually all businesses that collect or generate data.

Image Source: Data Sentinel

More often than desired, companies encounter problems such as:

  • Which files belong to whom?
  • Are there any old files on your systems that hold sensitive data?
  • How many copies of a file are there, and where are they?
  • Which files are no longer in use?

To ensure that business, regulatory, and data security requirements are properly met, it is necessary to promptly address these challenges. However, the large volume of data that businesses accumulate, can make it difficult and time-consuming to meet these objectives, especially when a significant portion of the data is deemed "bad data".

So,What Is Bad Data?

There are 4 common types of bad data.

Incorrect Data

According to the Global Data Management Benchmark Report published by Experian, around 23% of customer data is believed to be incorrect. In fact, a survey conducted the previous year found that 92% of organizations experienced issues due to inaccurate data within the past 12 months.

When businesses collect and analyze data to gain insight into their industry or customer base, the accuracy of this data is crucial. If the data contains incorrect or inaccurate information, it defeats the purpose of its collection and analysis.

Incomplete Data

A study by Dun & Bradstreet found that 1 in 5 businesses lose revenue and customers due to incomplete data.

Incomplete or empty fields in data are like missing puzzle pieces, preventing a complete picture from being formed.

For example, if a marketing campaign targets prospects in executive positions but not everyone on the list includes their job title, it could lead to missed opportunities or inefficient segmentation of potential consumers.

Image Source: Columbia Pictures

Duplicate Data

Contrasted with incomplete data, too much of a good thing causes problems of its own. Holding a variety of similar duplicates requires the answer of: which one is the right one?

A study published to The American Health Information Management Association found that duplicate records in hospitals often make up 5-10% of all stored records. Hospitals with multiple locations or which have merged with other systems may have duplicate rates as high as 20%.

When duplicate records, or worse near-duplicates, are present in the ER, conflicting data can lead to poor patient care and incorrect treatment. If it involves a type of medication or treatment for a patient that produces an adverse reaction, the right copy of the medical could be the difference of a life-or-death situation.

Irrelevant & ROT Data

The last but probably the most obvious bad data is ROT. ROT stands for Redundant, Obsolete, or Trivial data. In other words, it's data that's either no longer relevant (if it ever was) or has little to no value to the company that's keeping it.

According to a study by ManageEngine, at least 30% of an organization's unstructured data may be ROT,with some estimates pushing it much higher following the pandemic.  

Think about the last time you were searching for a document across your email, network drives, SharePoint, and SAP environments, how many ROT do you have to scan and skip through before you found the document that was truly important and the one you really need?

When an organization contains an excess of ROT data, productivity and desired outcomes will diminish.

What makes bad data ~so bad~?

Productivity Loss

When employees squander their time searching for accurate data amongst the clutter, productivity is negatively impacted both directly and indirectly, a waterfall effect of productivity loss may ensue.

According to a 2012 McKinsey & Co report, employees spend 1.8 hours every day (9.3 hours per week) on average searching and gathering information that already exists. To put it another way, those businesses hired 5 employees but only 4 of them showed up to work. The 5th one was off searching for answers, but not contributing any value.

Image Source: Unsplash.com(@olloweb)

Risk Assessment & Discovery Exercise

During risk assessment and discovery exercises (e.g. Data Subject Access Request, Freedom of Information, Access to Information and Privacy), rapid information retrieval is critical. The identification of relevant and regulated data can be significantly bogged down by ROT data, or worse, misled by incomplete data.

In a European based survey conducted by BearingPoint, 500 companies in Germany, Austria, and Switzerland were asked about their ability to provide complete correspondence (including all emails) for a given transaction within 2 weeks. Only 23% of the respondents said they could meet this requirement. This means that more than 75% of these companies either could not or were not confident in their ability to efficiently provide the requested correspondence within the given time frame. As more data and records are continuously being created, this task is becoming increasingly difficult for companies ill-equipped for the task.

Data Retention Risks

The SEC discovered that JP Morgan employees, including managers and senior compliance personnel, used personal devices for communication about company business, including texts, WhatsApp messages, and emails, from at least January 2018 to November 2020. These personal devices were not properly protected, archived, or subjected to retention or Data Loss Prevention policies, meaning that sensitive client information was at risk. In late 2021, JP Morgan admitted to bookkeeping failures and the use of WhatsApp to evade regulatory oversight, and agreed to pay $200 million in fines to settle the charges.

Data Protection & Security Risks

Since the implementation of the GDPR in 2018, various data protection laws have been introduced around the world, such as PIPEDA in Canada, CCPA/CPRA in California, with new laws set to take effect in Virginia,Utah, Connecticut, and Colorado in 2023, all set to further restrict data collection practices and protect consumer data. These privacy regulations introduced to protect consumers, present challenges for businesses that collect data and non-compliance can result in steep fines.

Recently, the SEC imposed a $35 million fine on Morgan Stanley for failing to protect customer data. This occurred because for 5 years, Morgan Stanley hired an incompetent company to dispose of old devices containing 15 million customer records. Some of these devices ended up on online auction sites because they were handled negligently.

Misinformation, Poor Decision Making, & Missed Opportunities

In addition to the potential for fines, bad data can lead to mistakes, inconveniences, bad decisions, and missed opportunities.

An example of this is the Mars Climate Orbiter, a NASA space probe launched in 1998 to study the Martian climate. The probe was expected to generate significant scientific breakthroughs, but it ultimately disintegrated in Mar’s atmosphere. An investigation attributed the failure to a measurement mismatch between two software systems: metric units by NASA and US customary units by spacecraft builder Lockheed Martin.

This bad data error resulted in the loss of the probe, a financial loss of $193 million, and immeasurable missed opportunity for scientific advancement. When data is inaccurate, it leads to ineffective plans and the potential to miss out on prospects and opportunities.

So, How Do We Break Out of These Bad Data?

Depending on the unique requirements and the types of bad data your organization has, the approach to manage your bad data may vary. Nonetheless, these are a few tips that could be beneficial for your data management strategy:

Data Audit & Discovery

Like GIJoe preached for years, “Knowing is half the battle”.

Image Source: Ideatovalue.com

Conducting a data audit and discovery process can help you understand the types and locations of data you have in your ecosystem, including both structured and unstructured data across various systems such as cloud, on-premises, SaaS, partner systems,and legacy environments.

The purpose of this exercise is to uncover what kind of data you have, and where they are. Similar to going through all your possessions before moving, you may uncover previously unknown or forgotten data, as well as any unexpected findings that have been overlooked for a long period of time. The goal is to have a clear understanding of the data you have and where it is stored.

Hopefully, you will not be too surprised with what you would find.

Inventory & Classify Content

Technically speaking, once we know all your 1s and 0s, you can then classify them in a way that you can easily understand and organize them in a data inventory.

The act of inventorying creates a cohesive data dictionary. This results in a 1-stop-shop for you to know the who, what, where, when, and why of all our data elements, and allow you to evaluate your data against your data governance and compliance policies.

ROT Analysis

As we have discussed at length how ROT data could slow you down, cost you more to keep them, and most importantly, put you up at the kind of compliance risks that you don’t want to be involved in.

A ROT analysis with the goal of streamlining your data footprint, improving your risk profile and system’s efficiency is an obvious exercise to combat bad data.

Deleting data can be scary. But with proper policies and rules set up, and an automated system that allow you to double and triple check, you can feel confident that any purged data meets your requirements for disposal.

Compliance & Risk Mitigation

We have already talked about how bad data (or badly managed data) could lead to unnecessary fines and publicity. For any business that uses and collects private data from consumers, doing a Privacy Impact Assessment (PIA) is almost always a necessity to remain compliant.

Even if it is not mandated by regulatory authorities, completing a privacy impact study has a number of organizational benefits:

  • PIA can serve as an early warning system or a method of detecting privacy issues. As a result, organizations may put in place protections before a breach or cybersecurity attack, rather than afterwards.
  • PIAs can bring sooner rather than later resolutions to privacy issues. It is possible for businesses to avoid making costly or embarrassing privacy blunders.
  • PIAs can also shows that an organization made an effort to avoid privacy risks, protecting themselves from unfavourable court judgments, unwanted publicity, and damage to reputation.

Data Metrics & Reporting

Hopefully by this point, your data is nice and clean. Now it’s time to use and leverage it to provide you the insights needed to make those decisions. Isn’t that the whole point for doing all these clean-up work?

Now, make sure you also invest in tools that can connect and present all your data sources on a single dashboard that can provide real-time visibility to your data. Businesses that can have the ability to sort through their data and discover otherwise unused, yet valuable, information, will have an advantage over their competitors and be ahead of the game.

Active Data Governance

As new attack methods are discovered and new protection laws are rolling out, we need to keep our processes and tools up to date to stay safe and compliant. 

Actively monitoring your systems is crucial for continued confidence in your records and data compliance governance policies and security protocols. Data management is an ongoing priority and if done manually, it would be incredibly expensive.

So, you would want to look for ways to automate it so that all you need to do is to input directions and rules, and letting the system handle the rest, 24/7.

Don’t forget, hackers don’t do much sleeping, if at all.

Bad Data Is Like Bad Weather

Let’s face it, it’s bound to happen. We just need to manage them with the strategies and tools that we have, just like our waterproof rain gear.

So, with that, we can say “there is no such thing as bad data, only badly managed data

Is proper data management important to you and your company? Yes? Glad to hear it. Contact us to learn why we're the best data nerds in the business.

Let's talk

Ready To Discuss Your Data Challenges?

you may also like

Blog

Do you need to customize your data classifications?

A company may want to customize its data classifications to better align with its specific business needs and goals. Custom data classifications can help a company manage and protect its sensitive information more effectively

News

Made in Canada

Mark Rowan was interviewed for Made in Canada Magazine.

Webinar

Webinar - C27 and the impact to Canadian businesses

Canadian data privacy bill C27 and its impact on Canadian businesses.