Exploring Your Data Using Infoworks AI
Introduction
When analyzing data, exploring your data is an essential first step because it helps uncover underlying patterns, identify data quality issues, and ensure the accuracy of your analysis. This critical step lays the foundation for making informed decisions and deriving meaningful insights from your data.
Infoworks AI is a powerful tool that simplifies the process of data exploration and analysis, enabling data analysts to gain insights quickly and efficiently with the help of Business-Aware AI™. In this blog post, we will explore how to use Infoworks AI for three critical data exploration tasks: profiling data, detecting duplicates, and identifying anomalies.
Profiling Data
Data profiling is the process of examining the data available from a data source and collecting statistics and informative summaries about that data. Infoworks AI makes this task straightforward with its automated profiling ability and using the natural language interface to make custom profiling inquiries.
Infoworks AI has a powerful feature built in to profile your data automatically. When adding a new warehouse or editing an existing one, there is a checkbox to enable “Automatically Profile the data”. When you select this option and click “Fetch Schema”, Infoworks AI runs profiling queries against the schema tables.
This process will provide you with always-available profiling information you can reference within a project’s Workspace. Using the 3 dots next to a table’s name, you can click Show Profile:
This will pull up the profiling results, which includes histograms and metrics such as unique values, min & max, standard deviation, and more.
If your use case requires additional profiling statistics such as mode or skewness, the natural language interface allows you to prompt Infoworks AI for custom profiling data. Here’s an example of the results when I ask the AI to provide the skewness, kurtosis (a measure of the “tailedness” of the data distribution), and the mode of the total amount and shipping cost in my orders table:
Profiling your data helps in understanding its structure and quality, making it easier to identify potential issues early in the analysis process.
Detecting Duplicates
Duplicate records can skew analysis results and lead to incorrect conclusions. Infoworks AI offers several methods to detect and manage duplicates in your data.
To identify duplicates, you can use natural language prompts. For example, you might ask a prompt as simple as “How many duplicates do I have in my orders table?”, or a more detailed prompt with additional guidance for improved accuracy such as: “Show me all order data while making sure there are no duplicate combinations of order id and shipping date.”
Infoworks AI can count duplicates based on various criteria, such as the primary key or all columns in a row, providing a clear picture of the extent of duplication in your dataset.
Once duplicates are identified, you can decide whether to flag them or create a de-duplicated view for analysis. For instance, you can say, “Using the previous criteria, show order data with duplicates marked with an indicator,” or “Create a view to show de-duplicated order data based on unique rows of data.” These capabilities ensure that your data analysis is based on accurate and unique records.
Identifying Anomalies
Anomaly detection is crucial for identifying unusual patterns that might indicate errors, fraud, or other significant issues. Infoworks AI supports several methods for detecting anomalies, including standard deviations, percentiles, interquartile range (IQR), and median absolute deviation (MAD).
For example, to identify outliers using the z-score method, you can prompt, “Can you identify outliers in my order amounts using a z-score?” Similarly, using percentiles, you might ask, “Can you identify outliers in my order amounts that fall outside of the 1st and 99th percentile?”
These methods allow you to spot data points that deviate significantly from the norm, enabling more focused investigations and data quality improvements.
Conclusion
Infoworks AI streamlines the process of data exploration with built in features and through its Business-Aware AI™, making it accessible even for users with limited technical expertise. By leveraging Infoworks AI data exploration abilities for profiling data, detecting duplicates, and identifying anomalies, data analysts can ensure high data quality and derive meaningful insights more efficiently.
Whether you are starting a new data project or maintaining an existing dataset, Infoworks AI provides the tools necessary to explore, cleanse, and analyze your data effectively. Stay tuned for more detailed guides on using Infoworks AI to address specific data challenges.