What is Data Mining?
Data mining is the process of exploring and analyzing data sets to discover meaningful patterns. The most widely-used model for data mining, the cross-industry standard process for data mining (CRISP-DM), breaks down data mining into six major phases: business understanding, data understanding, data preparation, modeling, evaluation and data presentation. This methodology symbolizes an idealized sequence of events through the data mining process and the steps often serve as guidelines for an iterative cycle instead of a rigidly linear process.
1. Business Understanding
First, users figure out what the current situation is and what they want to accomplish through data mining from a business perspective. They define the problem, identify goals and set up a plan to proceed.
2. Data Understanding
Users should determine what data is necessary, gather their data from all available sources, examine and explore their data and then validate the quality of the data for accuracy and completeness
3. Data Preparation
A critical step in the data mining process, users will properly select, cleanse, construct, format and merge data, preparing it for analysis. While time-consuming, data preparation helps ensure the most accurate results possible by cleaning data, purging unusable data and turning raw data into something a BI solution can actually work with.
Modeling is the core of any machine learning project. Users will decide which modeling technique to take to test scenarios that answer the project’s goals, then generate models through algorithms. This step consists of analyzing the data and generating tables, visualizations, plots and graphs that reveal trends and patterns.
Users will evaluate the results of the models in light of their originally defined business goals. They will make sure that the model produced is accurate and complete, and highlight what insights are most valuable from the results. Depending on what insights data mining uncovers, they may identify new objectives and additional questions to answer.
6. Data Presentation
The final step in the data mining process is turning all of this work into something useful to others, especially stake-holders. Users will take the results and determine a deployment strategy that ensures their analysis is understandable This could be as simple as creating a conclusive report, or as complex as documenting a reproducible, maintainable data mining process from start to finish. This may include delivering a presentation to the customer or decision-maker. Data presentation, or deployment as it’s sometimes referred to, summarizes the findings of the project and reviews the results to see if any improvements or next steps are necessary.
Get our BI Tools Requirements Template
CRISP-DM helps guide data scientists and data analysts through data mining with steps that follow common sense and help them gain a deeper understanding of their data and the problem they’re seeking to address.
Data mining software tools perform two main categories of tasks: descriptive or predictive data mining. Descriptive data mining, as the name suggests, relates to describing past or current patterns and identifying meaningful information about available data. Predictive data mining instead generates models that attempt to forecast potential results. Descriptive data mining is reactive and more focused on accuracy, while predictive mining is proactive and may not deliver the most accurate results. Descriptive data mining tasks include association, clustering and summarization, while predictive data mining tasks include classification, prediction and time-series analysis. Both kinds of tasks are important for inferring what has happened, what is currently happening and what may happen in the future.