- Data Govenance is about proactively managing your data in order to support your orgnization to achieve its strategy and objectives. You do this by improving the quality of your data.
- Three things that your organisation needs to be successful:
- A policy to mandate how your organisation is going to manage data.
- Roles and Responsibilities concerning data.
- Processes detailing whats needs to be done to manage data.
- Governance is the manner in which an entity chooses to oversee the control and direction of an area of interest.
- It typically takes the form of how decisions are made, regulated, and enforced.
- When entities grow and increase in complexity, formal governance becomes important.
- Data govenance is all about managing data well, but data governance is not restricted to only data management.
- Successful data governance also means that data risks can be minimized, and data compliance and regulatory requirements can be met with ease. This can bring important comfort to business leaders who, in some jurisdictions, can now be personally liable for issues arising from poor data management.
- Data governance is focused on roles and responsibilities, policies, definitions, metrics, and the lifecycle of data.
- Data management is the technical implementation of data governance. For example, databases, data warehouses and lakes, application programming interfaces (APIs), analytics software, encryption, data crunching, and architectural design and implementation are all data management features and functions.
- Data governance generally focuses on data, independent of its meaning. For example, you may want to govern the security of patient data and staff data from a policy and process perspective, despite their differences.
- Information governance is entirely concerned with the meaning of the data and its relationship in terms of outcomes and value to the organization, customers, and other stakeholders.
-
Fundamentally, data is driven by a desire to increase the value of data and reduce the risks associated with it. It enforces a leap from an ad hoc approach to daa to one that is strategic in nature.
-
Some of the main advantages achieved by good data governance include:
- Improved data quality
- Expanded data value
- Increased data compliance
- Improved data-driven decision-making
- Enhanced business performance
- Greater sharing and use of data across the enterprise and externally
- Increased data availability and accessibility
- Improved data search
- Reduced risks from data-related issues
- Reduced data management costs
- Established rules for handling data
The basic steps for creating a data governance program consist of the following. These steps also form the basic outline of this course:
- Defining the vision, goals, and benefits
- Analyzing the current state of data governance and management
- Developing a proposal based on the first two steps, including a draft plan
- Achieving leadership approval
- Designing and developing the program
- Implementing the program
- Monitoring and measuring performance
- Maintaining the program
-
Data governance is about managing data well and helping to deliver its optimum value to your organization.
-
It includes ensuring your data is available, usable, and secure.
-
Its the actions that team members take, the policies and processes they must follow, and the use of technologies that support them throughout the data lifecycle in their organization.
-
It's safe to say that for a growing number of organizations, data governance is becoming a very big deal.
- Data governance program must be aligned with the strategy of the organization.
- Data plays a role in many aspects of organizational strategy, including risk management, innovation, and operational efficiencies, so you must ensure theres clear alignment between these aspects and the goals of data governance.
- Your data governance program will only be possible with the right people doing the right things at the right time.
- Every data governance framework includes the identification and assignment of specific roles and responsibilities, which range from the information technology (IT) team to data stewards.
- At the heart of data governance program are policies, processes, and standards that guide responsibilities and support uniformity across the organization. Each of this must be designed,developed, and deployed.
- Depending on the size and complexity of the organization.
- The data governance program must have a mechanism to measure whether it is delivering the expected results.
- Capturing metrics and delivering them to a variety of stakeholders is importrant for maintaing support, which includes funding.
- You'll want to know if your efforts are delivering on the promise of the program.
- Based on the metrics, you and your team can make continuous improvements to ensure that the program is producing value.
- Fortunately, a large market now exists for tools in support of data governance and management.
- These include tools foe master data management,data catalogs, search, security, integration, analytics, and compliance.
- In recent years, many data science-related tools have made leaps in terms of incorporating ease-of-use and automation.
- What used to be complex has been democratized and empowered more team members to better manage and derive value from data.
- With the introduction of data governance and the ongoing, sometimes evolving, requirements, high-quality communications are key.
- This takes many forms, including in-person meetings, emails, newsletters, and workshops.
- Change management, in particular, requires careful attention to ensure that impacted team members understand how the changes brought about by the data governance program affect them and their obligations.
- Culture will always defeat the greatest of strategies almost every time. Imagine for a moment designing and deploying a data governance program for an organization that has little or no data culture. Intuitively this sounds like a disaster in the making. To be fair, every organization has some form of a data culture; it just might not be in an ideal state.
- On a basic level, data culture is how your organization values data and how it manages and uses it. There’s a wide chasm between companies that simply manage data as a consequence of doing business versus those that consider data central to how their organization operates and makes decisions (the latter being the qualities exhibited by a mature data culture).
If you decide that you need to better prepare the organization for data governance by maturing the data culture, consider these items to start. Good news! Many of these items are covered in detail in this course.
-
Help leaders communicate the value of data and model the type of behavior that demonstrates that data is a priority. This must include communicating the positive results of using data.
-
Provide basic tools and education for data use that include manipulating data, analytics, data cleansing, basic query commands, and visualization. Don’t overlook the remarkable capabilities of common applications such as spreadsheets.
-
Do something, even if it’s small, to show progress. A successful data culture doesn’t begin with the deployment of complex, far-reaching solutions. Rather, it can be eased into the organization via basic data-management skills offered in a classroom setting or online.
-
Recognize that resistance and frustration are part of the journey. Rather than fighting it, find ways to bring comfort and rewards to team members. At a minimum, provide a channel for feedback and positive discussion.
This course takes you through all the steps for designing and creating a data governance program, but you also have to consider the readiness of an organization prior to beginning the journey.
The following basic checklist of items will help you determine the data governance readiness of your organization:
- The basis of data culture exists.
- The program is 100 percent aligned with the business strategy.
- Senior leadershipp is 100 percent committed to program and its goals.
- Senior leadership understands this is a strategic, enterprise program and not the sole responsibility of the IT department.
- One or more sponsors have been identified at an executive level.
- The program has the commitment to fund its creation and to maintain it in the long term.
- The organization understands tis is an ongoing program and not a one-off project.
- You have documented the return-on-investment (ROI).
- Legal and compliance teams understand and supports the goals of the program.
- Fundamental data skills exist for the data governance journey.
- The IT organization is capable and resourced to support the program.
A solid definition of data and its role today gets us on the same page and sets the stage for delivering on the promise of data governance.
- Data refers to collections of digitally stored units, in other words, stuff that is kept on a computing device.
- These units represent something meaningful when processed for a human or a computer.
- Single units of data are tradionally referred to as datum and multiple units as data.
Data is also defined based on its captured format. Specifically, at a high level, it falls into one of the following categories:
- Structured: Data that has been formatted to a set structure; each data unit fits nicely into a table in a database. It’s ready for analysis. Examples include first name, last name, and phone number.
- Unstructured: Data that are stored in a native format must be processed to be used. Further work is required to enable analysis. Examples include email content and social media posts.
- Semi-structured: Data that contains additional information to enable the native format to be searched and analyzed.
-
As we entered the 21st century and the volume of data being created and stored grew rapidly. A hyperconnected world accelerating in its adoption and use of digital tools has required dusting off a seldom-used metric to capture the enormity of data output we were producing.
-
Today, we live in the zettabyte era. A zettabyte is a big number. A really big number. It’s 1021, or a 1 with 21 zeros after it. It looks like this: 1,000,000,000,000,000,000,000 bytes.
- Data that is never used is about as useful as producing reports that nobody reads. The assumption is that you have data for a reason. You have your data and it’s incredibly important to your organization, but it must be converted to information to have meaning.
The differences Between Data and Information:
- Information is the raw data or facts we encounter.
- Knowledge is what we get when we understand and can use that information. It's about applying information in a practical way.
- Wisdom builds on knowledge by adding judgment and experience. It's knowing not just what to do, but also when and why to do it.
- Insight is the deepest level of understanding. It combines knowledge and wisdom to see things in a new light and make better decisions.
To summarize, consider the following:
- Harry Styles is data.
- The fact that Harry Styles is a singer and was in the group, One Direction, is information.
- The fact that Harry Styles has aspirations to become a solo artist and is looking for a record deal is knowledge.
- The fact that One Direction was a very successful band with talented and popular individuals and knowing that Harry Styles is a creative artist who now wants a solo record deal is wisdom.
- Ensuring that Columbia Records make the decision to sign Harry Styles before anyone else does is insight.
Data's importance has skyrocketed since the mid-20th century. While always valuable, the explosion of computer systems dramatically increased the amount, quality, and accessibility of data. The internet's arrival in the 90s echoed the dream of information at everyone's fingertips, making data a true game-changer.
Data empowers better decision-making. From simple things like choosing a restaurant based on online reviews to complex business decisions like entering a new market, data analysis can be the key to success. Having the right data and the tools to understand it can mean the difference between a well-informed choice and a costly mistake.
Data is seen as the new oil, echoing the crucial role oil played in economies. However, there's a key similarity: both need refinement. Oil is processed into usable products, and data requires analysis to reveal patterns, inform decisions, and solve problems.
Data is the fuel of the 21st century's digital economies, similar to how oil powered the industrial era. Tech giants like Facebook and Google exemplify this. This data-driven transformation is impacting every industry, with data management becoming a strategic asset (profit center) for many. However, a concern arises: just like oil dependence, control of massive valuable data by a few players concentrates power, potentially mirroring the challenges of the oil era. We should learn from the past to manage potential risks associated with data control.
- Data ownership describes the rights a person, team, or organization has over one or more data sets. These rights may span from lightweight oversight and control to rigorous rules that are legally enforceable. For example, data associated with intellectual property — items such as copyrights and trade secrets — will likely have high degrees of protection, from accessibility rights to who can use the data and for what purpose.
Every organization relies on technology to operate, making them all tech businesses in a way. Enterprise Architecture (EA) helps design the right tech, policies, and projects to support an organization's goals. Data Architecture, a part of EA, focuses on how data is designed and managed to align with this overall strategy. In simpler terms, it's the agreed plan for how data fuels an organization's functions and technologies.
At a minimum, data architecture considers and typically supports the following:
- Ensuring data is available to those wo need it and are approved to use it.
- Reducing the complexity of accessing and utilizing data
- Creating and enforcing data protections to support organizational policies and obligations.
- Adopting and aggreing to data standards
- Optimizing the flow and efficient use of data to eliminate bottlenecks and duplication
Data architecture isn't just a technical thing, it reflects an organization's data governance practices. A well-designed data architecture shows the organization values its data, manages it strategically, and has controls to ensure it aligns with business goals. Similar to enterprise architecture, data architecture is a shared responsibility across the organization, not just for IT specialists. In larger organizations, data needs to move smoothly between departments and serve diverse users in various formats.
- Creation: This is the stage at which data comes into being. It may be manual or automated and get created internally or externally. Data is created all the time by a vast number of activities that include system inputs and outputs.
- Storage: Once data is created and assuming you want it available for later use, it must be stored. It most likely will be contained and managed in a database. The database needs a home, too as a local hard drive, server, or cloud service.
- Usage: Hopefully you’re capturing and storing data because you want to use it. Maybe not immediately, but at some point, perhaps for analysis. Data may need to be processed to be useful. That could include cleansing it of errors, transforming it to another format, and securing access rights.
- Archival: In this stage, you identify data that is not currently being used and move it to a long-term storage system out of your production environment. If it's needed at some point in the future, it can be retrieved and utilized.
- Destruction: Despite a desire by some to keep everything forever, there is a logical point where destruction makes sense or is required by regulation or policy. Data destruction involves making data inaccessible and unreadable. It can include the physical destruction of a device such as a hard drive.
Data has been around for a long time, even before computers. People have been recording information for millennia, like the Romans using ledgers. The 20th century saw a boom in data collection and processing due to advancements like microprocessors and the Cold War era's space race.
Data processing has a long history, dating back to the 1800s. The need to efficiently analyze census data led to the invention of punch card tabulation machines by the Tabulating Machine Company (later IBM). These machines used punched cards to represent data and could perform basic calculations.
Over time, data processing technology advanced, impacting various sectors like offices, factories, and academia. This progress laid the groundwork for the information age, where data became the fuel for innovation. Increased connectivity allowed for the free flow of information, leading to an explosion in data creation, storage, and use.
By the 21st century, the sheer volume and speed of data created a "data swamp," making traditional software struggle to manage it. This phenomenon led to the coining of the term "big data" to describe the challenges and opportunities associated with massive datasets.
Big data is structured and unstructured data that is so massive and complex in scale, that it’s difficult and often impossible to process via traditional data management techniques.
One way to define and characterize big data is through these five Vs:
- Volume: The sheer scale of data being produced is unprecedented and requires new tools, skills, and processes.
- Variety: There are already a lot of legacy file formats, such as CSV and MP3, and with new innovations, new formats are emerging all the time. This requires different methods of handling, from analysis to security.
- Velocity: With so many collection points, digital interfaces, and ubiquitous connectivity, data is being created and moved at increasing speed. Consider that in 2021, Instagram users created, uploaded, and share 65,000 pictures a minute.
- Variability: The fact that the creation and flow of data are unpredictable.
- Veracity: : The quality, including accuracy and truthfulness, of large volume of disparate sets of data, can differ considerably, causing challenges to data management.
- At a technology conference in 2003, the then-CEO of Google, Eric Schmidt. At the time said that every two days the world was creating more data than all the data created since the dawn of civilization.
- Big data was a thing even before Android and Apple smartphones and apps started generating data. This was before we had connected billions of devices, called the Internet of Things (IoTs), which would eventually begin collecting all manner of data. Big data even predates videos of cats published every day on social media platforms.
- By the third decade of the 21st century, with so many devices connected and the world in a state of digital transformation, the volume of data being created had experienced a Cambrian explosion — a term the data science community has adopted from an early period in history notable for the rapid introduction of life into the natural environment.
- In 2021, global technology use generated 79 zettabytes of data, and it is anticipated to hit 180 zettabytes in 2025. A learner seeing this course in 2040 might read the previous sentence and not be impressed at such small numbers, the same way a 32GB smartphone was considered a large amount of space in 2015.
- While these big data statistics are impressive, they don't really paint the full picture.
- It might be easy, for example, to assume that all the data is good quality.
- You might believe it is easy to analyze. You may even think it is easily accessible.
- Most of these assumptions and many related ones will likely be incorrect. For starters, up to 80 percent of data is unstructured. That’s a challenge right there. The vast majority of organizations struggle with unstructured data. In addition, a lot of this data is duplicative. Some of it will be bad data, which means it can’t be trusted, has errors, or includes some other substantive challenge.
- Big data is often more meaningful when broken into smaller, more manageable chunks an increasingly popular definition of small data.
- Smaller, logically arranged data can be the way to make sense of big data.
Smart data is essentially big data that's been cleaned and processed to be more useful. It's big data that's been optimized for a specific purpose. This is done by applying various tools and processes, like AI, to large datasets to identify patterns and extract the most relevant information. This allows businesses to target customers more precisely, improve analytics, and make better use of their data overall. One advantage of smart data is that it can be processed at the time it's collected, rather than needing extra processing later.
The text argues that data is valuable to all parts of an organization, not just data specialists or leaders. Data is used in many ways every day, so it's important to make sure it's high quality, secure, and accessible to the right people.
Business operations manage the essential tasks that keep a company running. These tasks vary depending on the company's needs, but some common functions include payroll, order management, and marketing. Other operational needs, like IT or warehousing, may not be required by every business.
Businesses use data extensively in their operations. This data helps them track performance (e.g., HR monitors hiring times), run essential systems (e.g., automatic inventory reorders), and make informed decisions (e.g., sales reports for executives). A lot of this data is generated by the operations themselves, such as application forms in HR or system logs. In addition to internal uses, operational data is also shared with external entities like company leadership.
Every organization, from the whole company to individual departments, needs a strategy. This strategy should analyze the challenges faced and propose solutions to achieve the organization's goals. To be successful, this strategy needs to be implemented effectively through operational excellence. In other words, a good strategy paired with smooth operations is key to achieving goals. Developing a strategy typically involves analyzing the situation, drawing conclusions, and creating a plan based on guiding principles.
While data is crucial, it shouldn't be the only factor when creating a plan. Experience and other perspectives are also important. The best plans consider a healthy mix of data and non-data sources.
- It’s generally accepted in business that the highest form of value derived from data is the ability to make better-informed decisions.
- The volume and quality of data available today have no precedent in history.
- Easy access to information has changed decision-making. Consumers can research health issues before seeing a doctor, which can be helpful but also lead to problems with inaccurate information. In businesses, this access to data allows for faster, better-informed decisions based on real-time information, giving them a significant advantage.
Businesses are constantly collecting data, both intentionally and unintentionally. This data helps them understand how the business is running (often following the idea that "what gets measured gets managed"). Ideally, leaders would have all the data they need to make informed decisions.
Data measurements can be quantitative or qualitative. Quantitative data is most often described in numerical terms, whereas qualitative data is descriptive and expressed in terms of language.
Monitoring is the continuous process of checking on something's performance. This could be a project, system, or anything you're interested in. It involves collecting data and comparing it to a target or expectation (like how many widgets a machine should produce). Monitoring helps ensure things are running smoothly, stably, and reliably.
Data monitoring is the process of continuously collecting data and using it to track performance (e.g., machine output, employee activity) and generate insights. This data is then fed into reports, dashboards, and real-time systems to help make informed decisions.
Here are two key points from the passage:
- Monitoring connects data and decision-making: It acts like a bridge, turning raw data into actionable insights across different departments. For instance, a team might measure data on a process, another team monitors that data, and a separate department might take action based on the insights.
- Data monitoring ensures data quality: It's not just about collecting data, it's also about making sure the data is accurate and complete. This is done by setting specific data quality metrics (like completeness and accuracy) and continuously monitoring them.
- Data is the foundation of many business functions, especially decision-making. In fact, data is the source of most valuable business insights, which can be thought of as information that has a significant impact.
- It’s not enough to simply collect lots of data and expect that insight will suddenly emerge. There must be an attendant management process. Thus, insight management means ensuring that data and information are capable of delivering insight.
Here's a summary of the text on insight management:
- Data gathering and analysis is key: It all starts with collecting and analyzing data from various sources. Those managing insights need to understand the organization's information needs and what data is valuable. They also need to know how information flows within the company and who needs it.
- Data is transformed into insights: Once the data is gathered, analysts interpret it to uncover its meaning and implications. This is where raw data becomes actionable insights.
- Communication is tailored to the audience: Insights need to be communicated effectively. Different audiences may require different formats, from concise summaries for executives to detailed reports for those needing specifics.
- Success is measured by actionable decisions: Effective communication of insights is judged by whether recipients use them to make decisions aligned with the organization's goals.
Perhaps the most obvious manifestation of data and information management in any organization is the use of reports. Creating, delivering, receiving, and acting on reports are fundamental functions of any organization. Some say they are the backbone of every business. That sounds overly glamorous, but it does speak to the importance of reporting and reports.
The content of a report, which can be summarized or detailed, contains data and information in a structured manner. For example, an expenditure report would provide a basic overview of the purpose of the report and then support it with relevant information. That could include a list of all expenditures for a department over a certain period or it could just be a total amount. It will depend on the audience and purpose. The inclusion of visuals is popular.
For example, a chart considered a visual form of storytelling, is a way to present data so that it can be interpreted more quickly. With so much data and complexity in today’s business environment, data storytelling is growing as both a business requirement and as an in-demand business skill.
The report may have a discussion of the findings and will conclude with a summary and sometimes a set of recommendations.
Here's a summary of the text on other important uses of data in organizations:
- AI: Data is like fuel for AI. The more high-quality data AI has, the better it performs at tasks like identifying patterns and making predictions. AI can also be used to improve data quality and usage within an organization.
- Problem-solving: Data is crucial throughout the problem-solving process. It helps define problems, identify solutions, evaluate options, and measure the success of chosen solutions.
- Data reuse: Data collected for one purpose can often be reused for entirely different reasons. For example, customer data collected by sales might be reused by marketing for targeted campaigns. This can reduce data collection efforts and get more value out of the data an organization has. However, data reuse needs to be done carefully to ensure it complies with data use regulations.
Now that the diverse roles of data have been identified and discussed, it’s useful to understand how data can be leveraged to acquire its maximum value. It begins with recognizing that data is an organizational asset. This simply means that it’s something that brings economic value to the organization. It’s clear to see this when it is pointed out, but many team members don’t yet look at data this way. When data is considered an asset - in fact, specifically a high-value asset - it often gets treated differently.
- An asset is something that is owned by a person, an organization, or a government with the expectation that it can bring some economic benefit. This includes the generation of income, the reduction of expenses, or an increase in net worth.
- Organizations care about both types of assets because they typically get captured in their financial accounts. Listing the value of assets presents the true state of any organization and reflects its financial health. In addition, capturing and valuing assets is required for determining tax obligations and for acquiring loans.
- After it is processed from its raw form, data has the potential to create enormous economic value for all manner of stakeholders. Here are some examples of the economic value of data:
- Improves operations.
- Increases existing revenue.
- Produces new forms of revenue.
- Builds relationships with customers and other stakeholders.
- Improves the quality of products and services.
- Contributes to competitive advantage.
- Enables innovation.
- Reduces risk.
- Recognizing that data is an asset is the first step to good data governance.
- Bottom line: Data is an asset and for its value to be leveraged, it must be governed. This may be one of the most important motivations for good data governance.
- Raw data is largely useless. If you’ve ever briefly glanced at a large data set that has columns and rows of numbers, it quickly becomes clear that not much can be gathered from it.
- In order to make sense of data, you have to apply specific tools and techniques. The process of examining data in order to produce answers or find conclusions is called data analytics.
- A formal and disciplined approach is conducted by a data analyst, and it’s a necessary step for any individual or organization that’s trying to make good decisions for their organization.
Data analytics has four primary types:
- Descriptive: Existing data sets of historical data are accessed, and analysis is performed to determine what the data tells stakeholders about the performance of a key performance indicator (KPI) or other business objectives. It is insight on past performance.
- Diagnostic: As the term suggests, this analysis tries to glean from the data the answer to why something happened. It takes descriptive analysis and looks at the cause.
- Predictive: In this approach, the analyst uses techniques to determine what may occur in the future. It applies tools and techniques to historical data and trends to predict the likelihood of certain outcomes.
- Prescriptive: This analysis focuses on what action should be taken. In combination with predictive analytics, prescriptive techniques provide estimates of the probabilities of a variety of future outcomes.
- Data analytics involves the use of a variety of software tools depending on the needs, complexities, and skills of the analyst.
- Beyond your favorite spreadsheet program, which can deliver a lot of capabilities, data analysts use products such as R, Python, Tableau, Power BI, QlikView, and others.
- Data management is not the same as data governance! But they work closely together to deliver results in the use of enterprise data.
- Data governance concerns itself with, for example, defining the roles, policies, controls, and processes for increasing the quality and value of organizational data.
- Data management is the implementation of data governance. Without data management, data governance is just wishful thinking. To get value from data, there must be execution.
- At some level, all organizations implement data management. If you collect and store data, technically you’re managing that data. What matters in data management is the degree of sophistication that is applied to managing the value and quality of data sets.
- Poor data management often results in data silos across an organization, security and compliance issues, errors in data sets, and overall low confidence in the quality of data.
On the other hand, good data management can result in more success in the marketplace. When data is handled and treated as a valuable enterprise asset: insights are richer and timelier, operations run smoother, and team members have what they need to make more informed decisions. Well-executed data management can translate to reduced data security breaches, and lower compliance, regulatory, and privacy issues.
Data management processes involve the collection, storage, organization, maintenance, and analytics of an organization’s data. It includes the architecture of technology systems such that data can flow across the enterprise and be accessed whenever and by whom it is approved for use. Additionally, responsibilities will likely include such areas as data standardization, encryption, and archiving.
Technology team members have elevated roles in all these activities, but all business stakeholders have some level of data responsibilities, such as compliance with data policies and with realizing data value.
- In summary, good data management provides the opportunity for significantly enhanced organizational performance.
- Governing data means that some level of control exists to support a related policy. For example, an organization may decide that to reduce risk, there needs to be a policy that requires data to be backed up every day.
- The control would be the documentation of the process and enforcement of that policy. If, in the review of policy adherence, data wasn’t getting backed up, then you’d quickly know that governance, for whatever reason, was not working.
- To fully realize the potential of data in your organization means that data must be governed. Any time an organizational resource or asset is left unmanaged, it’s either a recipe for disaster or a missed opportunity. Even a small amount of governance beats no governance every single day.
The success of governing data can be reduced to three essential factors:
- People: While recognizing that data is increasingly created and used exclusively by machines without human intervention, handling and benefiting from data is still a highly people-centric exercise. Even in a machine-centric context, it’s people who are most often defining, designing, and maintaining data use. In governing data, people are the subject matter experts, they are responsible for quality, and they oversee and manage all related processes and responsibilities.
- Policies: A data policy contains a set of adopted rules by an organization that apply to the handling of data in specific conditions and for particular desired outcomes. These policies apply in areas such as quality, privacy, retention, and security. The number of policies is typically a reflection of the size of the organization, the industry, and the degree to which data is considered a high-priority asset. As you can imagine, the healthcare and financial industries, for example, which manage high volumes of sensitive data, have a significant number of data policies in support of their data governance programs.
- Policies: A data policy contains a set of adopted rules by an organization that apply to the handling of data in specific conditions and for particular desired outcomes. These policies apply in areas such as quality, privacy, retention, and security. The number of policies is typically a reflection of the size of the organization, the industry, and the degree to which data is considered a high-priority asset. As you can imagine, the healthcare and financial industries, for example, which manage high volumes of sensitive data, have a significant number of data policies in support of their data governance programs.
- The ability of team members to access data that they need for their work, without having to rely on specialists, is called data democratization.
- As organizations grow and more systems are employed, eventually no single person knows what data is available and where it is in the enterprise.
- Without this knowledge, the ability to properly govern your data and leverage its value is greatly hampered. Without deliberate actions, data democratization becomes elusive.
- A data silo is a data repository controlled by an entity in an organization but not frequently shared or known by other parts of the business.
- Data silos hinder business efficiencies because they reduce collaboration and increase data inconsistencies. In addition, they are a source of risk, including security and regulatory issues.
- Data governance helps eliminate unnecessary data siloes and makes data discoverable and available whenever and wherever it adds value.
- Data catalogs, discussed in this chapter, are an essential way that data governance can help solve these limitations.
Knowing what data is available is essential for the following reasons:
-
Better informed decision-making.
-
Ensuring compliance and regulatory requirements.
-
Lower costs by avoiding duplicate system and data efforts.
-
Improved data analytics and reporting.
-
Higher performing systems.
-
More efficient operations.
-
Reducing data inconsistencies across the enterprise.
-
Fortunately, the vendor community is ready to help you build your internal search capabilities. It’s taken some time, but solutions have come a long way. With investment and effort, finding data and information in the enterprise is possible.
You can take a few approaches to assist your organization so that your team members can find data. One option involves the creation of an enterprise search engine. It’s certainly possible, but not easy, and will face some limitations such as the discoverability of confidential data that is deliberately siloed. In addition, a search engine won’t necessarily provide insights on whether data is available, current, accurate, or complete. Its common purpose is simply to provide you with the location of the data.
Another, increasingly popular, method of data discoverability is the creation of an enterprise data catalog. Like a store catalog that categorizes products and includes details such as availability and price, a data catalog lists the availability of data sets and includes a wide range of valuable details about that data.
The thre essential benefits of data catalogs are:
- Finding data: Helps users identify and locate data that may be useful.
- Understanding data: Answers a wide variety of data questions such as its purpose and who uses it.
- Making data more useful: Creates visibility, describes value, and provides access to information.
- Done right, a data catalog delivers a comprehensive inventory that provides an enterprise view of all data.
- This view provides essential insight that helps with leveraging data value and provides a robust tool to assist with data governance.
- A data catalog is more than just a list of all data sets. Sure, for many organizations, this feature alone would add enormous value.
- What makes a data catalog particularly valuable is that it contains data about the data. It’s called metadata.
A data catalog can contain three types of metadata: technical, business, and operational.
- Technical metadata: Data about the design of a data set such as its tables, columns, file names, and other documentation related to the source system.
- Organizational data such as a business description, how it is used, its relevancy, an assessment of data quality, and users and their interactions.
- Operational metadata: Data such as when the data was last accessed, who accessed it, and when was it last backed up.
Examples of metadata include the following:
-
Associated systems.
-
File names.
-
File locations.
-
Data owners.
-
Data descriptions.
-
Dates created.
-
Dates last modified.
-
List of database tables and views.
-
Data stewards.
-
Size of data sets.
-
Quality score.
-
Comments.
-
For a large number of stakeholders ranging from data analysts to data stewards, a data catalog presents many advantages. Primarily, the ability to find data tops the list. But it provides much more than that.
With a data catalog, an organization can:
- Know what data it has (and by extension, know what data is missing).
- Reduce data duplication.
- Increase operational efficiencies and innovation.
- Understand data quality.
- Manage compliance.
- Enjoy cost savings from improved operations.
- If you can’t use your data to make better decisions and drive your organization forward, the data may just be worthless.
Acquiring and applying insight from data means defining the following:
- Context: Understanding the environment and objectives of the outcome
- Need: Determining how insight will help to accomplish the objective
- Vision: Having ideas about how insight will help and what that might look like in practice
- Outcome: Specifying how insights will be adopted and success will be measured
These dimensions can be used to answer questions such as:
-
What data is required?
-
Does the data exist?
-
Is it current?
-
Is it easily available?
-
What format is the data in?
-
What kind of data analysis is required?
-
How will the data be presented?
-
Converting data into insights is no easy task. It’s complicated and skilled work and relies on good quality data that is accessible. Those tasked with delivering insights often cite data quality, data volume, work effort, and integrating data from various sources as the top reasons that make it difficult and create a deterrent to adoption.
-
Quality insights can provide a competitive advantage and operational excellence, but organizations have work to do to fully realize their potential.
- More realistically, to realize the benefits of data and discover insights, you need analytics. Analytics unlocks the power embedded in good-quality data.
- Data analytics involves both specialized skills and software to explore data sets and extract insights that may be useful to an organization.
- Data analysis is concerned with identifying a data set, examining it, and reporting on any findings.
The similarity in terms and overall purpose should be noted, but understanding the difference is important.
The source of data for analytics is one or a combination of the following:
- First-party data: Data that an organization collects.
- Second-party data: Data that is obtained from another organization.
- Third-party data: Aggregated data obtained from a provider.
Typical uses of contemporary data analytics tools and techniques include:
- Vastly improved decision-making
- Focused marketing campaigns
- Understanding the competitive landscape
- Designing more innovative products
- Better customer service
- Improved operations
- Insights on customer behavior
- Intuitively, when something has a high value, it’s likely to be treated differently from things with little value.
- Without a process to place a price on a data set - called data valuation - the value of a given data set may be highly subjective and may differ considerably between the perspectives of team members.
- It's very likely that some of your personal information is being traded often in an open marketplace. You probably agreed to it in the small print that none of us ever read when using a new online service. It means that data about you has a market price.
Many ways exist to determine data valuation. Here is a brief summary of a few methods.
-
Cost value method: Value is calculated by determining how much it costs to produce, store, and replace lost data. It’s a simple method and can be useful as a lightweight approach, but it is subjective and doesn’t necessarily account for the economic value that the data can produce.
-
Market value approach: Value is calculated by researching how comparable data is being priced in the open market. It’s a great approach if market-based comparable data exists but doesn’t work for the vast number of data sets that are not traded.
-
Economic value approach: Value is calculated by measuring the impact a data set has on the business’s bottom line. It’s a difficult approach because it may be nearly impossible to identify the specific value of the data relative to other contributors of value.
-
With-and-without method: Value is calculated by quantifying the impact on cash flow if a data set needs to be replaced. Scenarios with and without the data are explored and the difference between cash flow is used to determine data value. Like others, this can be challenging to pinpoint the specific impact of a data set.
-
What’s clear from data valuation methods is that none of them are perfect. Above all, data valuation is very hard. But in practice don’t be discouraged. It’s hard for every type of organization. Fortunately, a number of providers are ready to help if you want outside assistance.
Six-step process to data-driven decision-making:
- Define the objectives: This step involves understanding the objectives relative to the effort and their alignment with organizational goals. This will help you scope the work and define the metrics. In fact, it can be useful to define success and then work backward. For example, if you’re trying to increase sales in a particular region, you need to identify which metrics to capture in order to determine whether you achieved that objective.
- Identify the data: In addition to using a data catalogue, enterprise search, or similar, this step requires engaging with impacted stakeholders. Getting input from a diverse group of people and teams will help you scope the data. This may generate the need to gather data that doesn’t exist. You may need to consult with data stewards, data owners, and others with data governance responsibilities.
- Prepare the data: After Step 2, you'll understand the degree of preparation you need. If the problem you’re trying to solve is narrow and the data is easily accessible and high-quality, you’ll be in pretty good shape. In most cases, your situation is unlikely to be simple. The data necessary to meet your decision-making objectives will likely come in a variety of formats and will be in need of some remediation. You may need some deep data science skills to prepare the data for use in a data analytics platform.
- Analyze the data: Once you reach this point, the most exciting part begins. The assumption is that you’re using a useful analytics tool. For complex analysis and continuous efforts, several tools will be used. This could include support for an ETL architecture (Extract, Transform, and Load). This is when data is extracted from one system and made ready and available for use in another. To analyze this data, you will also need relevant representations such as visualizations. These could include graphs and charts. Your tool selection and how the data is presented will depend largely on the audience. For executives, a dashboard may be the right approach.
- Determine the findings: Once you have data that you can display in a variety of ways, you can ask questions about it. For example, if you’re trying to understand customer demographics relative to sales in a particular region, you may want to toggle between different age ranges. While all phases of DDDM have complexity, the tough work here is knowing which questions to ask. This skill is aided by training, but experience helps.
- Take action: That’s all there is to this step. Make your decisions. If you’ve completed Steps 1-5 well, but no action is taken (assuming that’s not the decision based on analyzing the data since concluding that no decision is necessary is, in fact, a decision), you’ve wasted a lot of time. DDDM is all about the decisions that result in actions.
Consider these reasons that organizations don’t take a proactive approach to managing data:
- “If it ain’t broke”, why fix it?
- It’s too expensive and time-consuming to focus on data.
- It’s far too complicated.
- Data management and governance — what is that?
Every one of these is valid. If your business is not open or able to recognize the potential upside of managing data as a high-value asset, it will be an uphill battle to convince them otherwise.
- The purpose of any type of strategy is to agree on a set of guiding principles that inform decision-making in support of a desired outcome. In simple terms, it’s the roadmap on how to reach your goals.
- A business strategy describes how a business will achieve its vision and mission. It creates clarity for leaders and team members on what to do, what to avoid, and how to prioritize actions. Having a clear strategy is important to avoid confusion and wasted effort. However, creating and communicating a strategy can be difficult and some businesses neglect this important step.
- The passage says that strategic plans can be created for more than just overall company goals. Every department and function within a company can benefit from having a strategic plan to support the overall goals. This includes plans for managing important assets like data.
So, you’ve made the decision to create a data strategy. That’s great. Before jumping in, consider the following characteristics as a guide to your approach:
- Data maturity: This can be defined simply as the degree to which the organization already uses and optimizes data and has experience and skills, as well as the quality of the existing data. All organizations use data, but there’s a big difference between those that have prioritized it for a long period and those just deciding to treat it as a strategic asset. For example, without some basic data standards, security policies, and a process to cleanse data, layering analytics on top of it will likely cause frustration and in the worst case, errors in the results.
- Industry and size: You can think of data prioritization through two frameworks: defense and offense. Defense deals with fundamental areas such as data security and quality. Offense is using data for insight management and market-facing initiatives. Every organization does both, but most emphasize one over the other depending on the industry and its size. For example, a healthcare company may prioritize a defense framework for data given the highly regulated nature of the industry.
A data strategy should typically account for these five areas of data requirements:
- Identify: To find and make data usable, it must be clearly defined and described. This includes a file name, a file format, and metadata.
- Store: Design and develop the capabilities for supporting the place and process for hosting data and how it will be shared, accessed, and processed.
- Store: Design and develop the capabilities for supporting the place and process for hosting data and how it will be shared, accessed, and processed.
- Process: Raw data must be transformed to become valuable. This includes processes for data cleansing, standardization, and integration with other data sets.
- Govern: Institute processes to manage and communicate data policies for data use within the organization.
Data requirements should consider these four data strategy components:
-
Alignment with the business: A data strategy is a subset of the overall business strategy. This means the data strategy must support and advance the larger goals of the organization. When determining the goals of the data strategy, where possible, map them as clearly as possible to illustrate how they are in support of the business strategy. For example, the business may want to reduce customer acquisition costs. A data strategy will be a valuable way to identify potential customers with a higher likelihood of conversion to buyers. Keep in mind that the strategy of a business evolves constantly, sometimes slowly and other times quickly. Your data strategy has to evolve in sync, as appropriate.
-
Identifying roles and responsibilities: A strategy requires people to take specific actions. Without action, a strategy is a worthless document. In the data strategy, you’ll want to document the different roles that team members will play. Most will be data consumers. These are the employees who access and use data. They will certainly have responsibilities. For example, there will be an expectation about how different classifications of data should be handled. If something is public, then that’s entirely different than something that’s confidential. However, the bulk of the responsibility for ensuring that a data strategy can be delivered and maintained will rest with team members such as the information technology staff, data scientists and analysts, data stewards and owners, and management. It will be quickly apparent that data strategy and data governance have overlapping and dependent goals.
-
Data architecture: This area relates to the processes, systems, and applications that support working with data. Basic areas include defining data storage needs and analysis tools. It also includes items such as a data catalog, a data warehouse where data can be stored and made ready for analysis, and the methods and tools for data pipelines, moving data from a data source to a destination, and related ETL (extract, transform, and load) functions. A data strategy should support the scalability of your data architecture as well as have some flexibility as needs change. Note that data architecture is often the driver of choosing and designing data management processes and systems.
-
Data management: This area is the broad umbrella of activities that manage the full lifecycle of data in an organization. It recognizes that data is a strategic asset and must have the attendant processes, procedures, policies, skills, and tools to ensure it is treated in such a manner. This includes areas such as the management of data security and privacy, quality, metadata, integration, master data management, and analytics.
-
Data governance establishes the rules for data use, and data management ensures that, in the act of realizing data value in the organization, these rules are followed. For example, a data governance policy may state that data with a certain confidentiality classification may only be accessible by a specific role level in the organization. Data management will be the processes, tools, and staff that ensure that this governance rule is followed.
- The passage highlights the benefits of creating a data strategy. It acknowledges the difficulty but emphasizes the potential for a company to be ahead of the curve in leveraging data for better performance. The key to success lies in having buy-in, a roadmap for execution, and strong data governance and management practices.