Analytics is the heart of modern business. The role of a data analyst is to transform raw data into actionable insights that guide decision-making processes within an organization.
- Sourcing data from various channels, including database, spreadsheets, and external sources,
- Cleaning and organizing data to ensure it is accurate, consistent, and ready for analysis.
- Make use of statistical methods, machine learning techniques, or other analytic tools to interpret data,
- Identify trends, patterns, and correlations that might not be immediately obvious.
- Creating visual representations of data, such as charts, graphs, and dashboards, to make complex information easily understandable,
- Articulating findings in a compelling narrative to communicate the significance of the data to stakeholders.
- Making recommendations based on data-driven insights to help guide business decisions,
- Providing context around the data, including potential implications and future trends.
- Working closely with other departments, such as marketing, finace, and operations, to understand their needs and provide insights,
- Effectively communicating complex data findings in a clear and concise manner to non-technical stakeholders.
- Keeping up-to-date with the latest industry trends, tools, and technologies in data analysis.
- Adapting to new types of data and analytical methods as the organization's needs evolve.
Three major pillars that have come together at this moment to allow analytics program to thrive:
- Data
- Storage
- Computing Power
The massive volume of data generated by our businesses on a daily basis, the availability of inexpensive storage to retain that data, and the cloud's promise of virtually infinite computing power come together to create fertile ground for data analytics.
Highest-demand occupations:
- Data analydts and scientists
- AI and machine learning (ML) specialists
- Big Data specialists
- Digital marketing and strategy specialists
- Process automation specialists
- Business development professonals
- Digital transformation specialists
- Information security analyst
- Software and applications developers
- Internet of Things (IoT) specialists
Process that analysts move through:
- Data Acquisition
- Cleaning and Manipulation
- Analysis
- Visualization
- Reporting and Communication
The analytics process is inherently iterative. It is not linear, one-time thing you do to data. It's more like a loop or cycle where you constantly revisit previous steps and refine your approach based on what you learn.
Analysts use a variety of tecniques to draw conclusions from data at their disposal.
Major categories of analytics techniques:
-
Descriptive Analytics: is a crucial first step in data analysis journey. It provides a foundational understanding of your data before diving into more complex techniques like predictive or prescriptive anallytics. It summarises data to gain insights into patterns, trends, and relationships. It focuses on what happened in the past and identifying trends and patterns within your data.
-
Predictive Analytics: aims to forecast future outcomes based on historical data and statistical algorithms. It involves use of techniques such as regression analysis, machine learning, and data mining to identify patterns and trends that can help predict future events or behaviors. It enables organizations to anticipate potential scenarios and make proactive decisions to optimize outcomes.
-
Prescritive Analytics: is a powerful tool that can transform data insights into concrete actions for achieving optimal outcomes.
The work of analytics is intellectually and computionally demanding.
- Artificial intelligence (AI): includes any type of technique where you are attempting to get a computer system to imitate human behavior.
- Machine Learning: is a subset of AI techniques. ML techniques attempt to apply statistics to data problems in an effort to discover new knowledge. ML techniques are AI techniques designed to learn.
- Deep Learning: is a further subdivision of machine learning that uses quite complex techniques, known as neural networks, to discover knowledge in a particular way. It is a highly specialized subfield of machine learning that is most commonly used for image, video, and sound analysis.
Data governance program ensure that organization has high-quality data and is able to effectively control that data.
Software helps analysts work through each one of the phases of the analytics process. These tolls automate much heavy lifting data analysis, improving the analysts ability to acquire, clean, manipulate, visualize, and analyze data. They also provide invaluable assistance in reporting and communicating results.
- Microsoft Excel
- Google Sheets
- A data element is an attribute about a person, place, or thing containing data within a range of values. Data elements also describe characteristics of activities, including orders, transaction
- Individual data types support structured, unstructured, and semi-structured data.
- Tabular data is data organized into a table, made up of columns and rows.
- A table represents information aboout single topic.
- Spreadsheets, including Microsoft Excel, Google sheets, and Apple numbers, are practical tools for representing tabular data. A rational database management system (RDMS), commonly called a database, extends the tabular model.
- Instead of having all data in a single table, a database organizes related data across multiple tables.
- Oracle, Microsoft SQL Server, MySQL, and PostgreSQL are examples of database software.
- Tabular data is the concept that underpins both spreadsheets and relational databases.
- Structured data is tabular in nature and organized into rows and columns.
- Structured data is what typically comes to mind when looking at a spreadsheet.
- With clearly defined column headings, spreadsheets are easy to work with and understand.
- In a spreadsheet, cells are where columns and rows intersect.
- The character data type limits data entry to only valid character.
- Characters can include the alphabet that might see on your keyboard, as well as numbers.
- Alphanumeric is most widely used data type for storing character-based data.
- Alphanumeric is appropriate when a data element consist of both numbers and letters.
- Alphanumeric data type is deal for storing product stock-keeping units (SKUs).
- It is common in the retail clothing space to have a unique SKU for each item available for sale.
- Database use character sets to map, or encode, data and store it digitally.
- The ASCII encoding standard based on the U.S English alphabet.
- ASCII accomodates both the upper and lower English alphabet and numbers, mathematical operators, and symbols.
- Data types define values placed in column.
- Strong typing is when technology rigidly enforces data type.
- A database column defined as numeric only accepts numerical values.
- Weak typing loosely enforces data types.
- Spreadsheets use weak typing to help make is easier for people to accomplish their work.
- Spreadsheets default to an "automatic" data type and accomodate practically any value
- Unstructured data is any type of data thta does not fit neatly into the tabular model.
- Examples of unstructured data include digital images, audio recordings, video recordings, and open-ended survey responses.
- Binary data types store information in raw bytes, but they can represent structured or unstructured data.
- It supports any type of digital file you may have, from Microsoft Excel spreadsheets to digital photographs.
- File size limits are set by storage system, not the data type its self.
- When choosing a binary data type, consider the expected file size, performance needs, and storage efficiency.
- Audio can come from a variety of sources.
- Audio is everywhere, from customer service calls to avalanche detection systems
- This data is captured with microphones, digitized, and stored.
- To save space, it can be compressed.
- No matter the format, storing audio requires specific types for raw binary information.
- Image data can have a variety of sources.
- People take more than 1 trillion photographs every calender year, fuelled by the ubiquity of camera-enabled smartphones and relatively low storage costs.
- Each digital picture is a piece of unstructured data.
- Storing images in a database requires a data type designed to handle raw binary, such as varbinary or BLOB.
- Video data is growing at a similar pace to image data.
- In the consumer space, people upload videos to YouTube, Instagram, and TikTok everyday.
- Regardless od structure, data is either quantitative or qualitative.
- Quantitative data consists of numeric values. Data elements whose values comes from counting or measuring are quantitative.
- Qualitative data consists of frequent text values. Data elements whose values describe characteristics, traits, and attitudes are all qualitative.
- Numeric data comes into different forms: discrete and continuos.
- Discrete data represents measurements that can't be subdivided.
- It is useful when you have things you want to count.
- Instead of counting, when you measure things like height and weight, you are collecting continuous data.
- While whole number represents discrete data, continuous data typically need a decimal point.
- Qualitative is discrete, but quantitative data can be either discrete or continuous data.
- Text data with a known, finite number of categories is categorical.
- Dimensional modeling is an approach to arranging data to facilitate analysis.
- Dimensional modeling organizes data into fact tables and dimension tables.
- Fact tables store measurement data that is of interest to a business.
- A table holding appointment data would be called fact table.
- Dimensions are tables that contain data about fact.
- Tabular is a structured data, with values stored in a consistent, defined manner, organized into columns and rows.
- Data is consistent when all entries in a column contain the same type of value.
- Structured data also make summarization easy.
- Unstructured data qualitative, describing the characteristics of an event or an object.
- Images, phrases, audio or video recordingd, and descriptive text are all examplea of unstructured data.
- Machine data is common source of unstructured data.
- Machine data has various sources, including Internet of Things, smartphones, tablets, personal computers, and servers.
- Semi-structured data is data that has structure and that is not tabular.
- Email is a well-known example of semi-structured data.
- Every email message has structural components, including recipient, sender, subject, date, and time.
- Textfiles are one of the most commonly used data file format.
- They consist of plain text and are limited in scope to alphanumeric data.
- The reasons why textfiles are so widely adopted is their ability to be opened regardless of platform or operating system withou needing a proprietary piece of software.
- You can easily open a textfile.
- Textfiles are also refered as flat files.
- A unique character known as delimiter facilitates transmitting structured data via a text file.
- Delimiter is the character that separates individual fields, it can be any character.
- When a file comma-delimited, it is known as comma-separated values (CSV) file.
- When a file is tab-delimited, it is called a tab-separated values (TSV) file.
- Fixed-width files are more laborious to create since they require a few extra steps.
- JSON is an open standard file format, designed to add structured to a text file without incurring significant overhead.
- JSON is easily readable by peopla and easily parsed by modern programming languages.
- Languages such as Python, R, and Go have libraries containing functions that facilitate reading and writing JSON files.
- XML is a markup language that facilitates structuring data in a text file.
- While conceptually similar to JSON, XML incurs more overhead because it makes extensive use of tags.
- HTML is a markup language for documents designed to be displayed in a web browser.
- HTML pages serve a foundation for how people interact with World Wide Web.