has a number of potential business outcomes. As a fundamental element of predictive analysis, its application is the basis for the implementation and evolution of Analytics processes, which offer valuable subsidies for decision making.
However, Machine Learning’s results in improvements are not always as striking as the deployment analysts expected, nor as practical and applicable as the contracting company wanted.
Generally speaking, companies sometimes do not achieve the business improvements offered in the project. When this occurs, generally the causes are related to the difficulty of designing, executing, and measuring the actions necessary to obtain the improvements based on the results of the machine learning models.
In this article, we will go through the process of implementing and applying Machine Learning in companies that aim to improve processes and strategies.
We will talk about the challenges and how to deal with them, the procedures involved, and what it takes to bring your business closer to really effective results with advanced analytics.
What are the challenges to obtaining results with Machine Learning?
In problematic Analytics projects, the results of applying Machine Learning models are often confusing. They do not clearly reflect the real and manageable problems of the business, generating great frustration in the face of the effort and investment made.
The causes that generate this situation can be grouped into 3 major categories:
- Expectations of the people involved,
- Techniques used
- Project approach
Overview of an advanced Analytics project
A good way to summarize where we want to go with an advanced Analytics project “running” in the organization is to consider that we have:
- Data from different sources
- Machine Learning in action
- Deliveries
For all of this to happen, different stages of planning and execution, specialized knowledge, as well as special attention to obstacles and challenges that must be prevented and controlled are necessary.
The technical issue is a fundamental pillar for successful projects involving Machine Learning. Find out below its main elements when it comes to implementing advanced analytics in the business:
- Building a quality dataset
The dataset to be worked on – is the “raw material” for extracting information of value to the business. Thus, it is essential to build a dataset with the largest amount of “raw” columns from source systems. The dataset needs to naturally receive and absorb data from different sources, including DWH, ODS, CRM, the newly incorporated Data Lakes.
-
- DWH, or Data Warehouse: it is a unified repository for all the data that is collected by the various information systems of a company. It has analytical and reporting purposes.
- OSD: It is a unified repository that generally only stores operational data and has analytical and reporting purposes.
- CRM: Customer relationship tracking solution, normally oriented to manage three basic areas: commercial management, marketing, and after-sales or customer service.
- Data Lake: It is a storage repository that contains a large amount of raw, structured and unstructured data, regardless of its source or format. Currently, in most cases, it is built on the Hadoop system.
- Choice of Machine Learning technique
This is the time to choose and apply a Machine Learning technique
-
- That is really suitable for the problem you want to model;
- That uses the available data;
- And get the most out of the data.
- Choice of analysis algorithms
Once the Machine Learning technique is selected, it is time to select the algorithm or set of algorithms that offer the best result in predictive analysis. That is, how the data will be effectively processed and worked on in order to provide accurate and really useful conclusions for the business.
It is never too much to emphasize the importance of careful evaluation of these algorithms: after all, we are talking about something that will become a source of insight for an organization.
A good example of a technique for making this choice is to use the confusion matrix. This is a way of measuring the performance of machine learning, in which the algorithm’s successes and errors are evaluated in a weighted way, according to their “weight” in the processes in which they are inserted – business decision processes, for example. In other words, the real impact of Machine Learning performance is measured in view of its application context.
Enriching the Dataset
As we mentioned earlier, the dataset is like the raw material for Analytics. And for the best performance, it can’t be limited to just raw data – the “raw” variables.
From the professional experience supporting many companies in their advanced Analytics processes, the importance of emphasizing the dataset creation process and its enrichment with derived variables and business hypotheses is very clear.
In this way, the dataset becomes a more robust “starting point” for the path that leads to more accurate and effectively applicable analyzes.
Thus, the enriched dataset consists of:
- “CRUAS” VARIABLES: the data as it is, coming from its different sources and only organized and incorporated into the system
- DERIVED VARIABLES: they are built from raw variables. Incorporate business knowledge, increasing the predictive capacity of the Machine Learning algorithm. Its use can avoid obtaining obvious results in predictive modeling processes – which is considered a data scientist’s nightmare.
- BUSINESS HYPOTHESES: defined as business situations common to the problems that will be modeled/treated and which have already been observed in other customers and industries, especially in the same segment of the company in question. They bring breadth and applicability to predictive analysis.
Enriched dataset and new predictive model components
With the enriched dataset, the results obtained from the tests and business hypotheses allow the incorporation of new components in the predictive models. These components are features that enable:
- Develop the artificial intelligence approach
- Segment customers by common business situations
- Use techniques, allowing the same problem to be addressed simultaneously with the same dataset starting from different Machine Learning techniques (supervised and reinforcement), thereby increasing the performance of the predictive model
- In supervised techniques, identify the “forces” that
- Keep the customers or studied entity in conditions of stability
- Move customers or entities out of balance
Therefore, advanced analytics processes are really effective need to rely on the adoption of these components, which provide proven improvement and results with high adherence to the actions that companies must develop.