Machine Learning

AI Ambiguity: Flipping the Switch

April 12th, 2022 by Taymour

The AI topic generates a lot of buzz, but as with any major debate, some claims must be taken with a grain of salt. The fact that AI means different things to different people almost guarantees that some claims are a stretch. Definition aside, AI’s underpinnings are complex and constantly evolving, which means the discipline requires constant interpretational flexibility. Some practitioners, in their own self-interest, will take advantage of this ambiguity by initiating AI projects that might never be realized. Therefore, upright practitioners must call out blatantly false or misleading claims. This paper’s focus is to make the reader aware of an exploitation in a branch of AI called machine learning. Without attempting to define this subdiscipline too strictly, we’ll simply say that machine learning is a way of solving problems with data, using computers.

Actionable Levers

Machine learning does not physically solve problems; rather, it proposes a means to an end. A machine learning project’s outcome is a description or prediction that leads to a recommendation for a person (software, robot, etc.) to perform a certain action. In other words, machine learning itself does not save lives, reduce costs, or generate more revenue. Rather, the decisions based on AI’s recommendations are what make the difference (i.e., the recommendation to prescribe a particular medication cocktail could save lives, or the decision to hire more or fewer people could affect revenue). Machine learning’s outcome, in short, is information, which must have a corresponding lever or action that a person, software, or robot will take to realize its benefits.

AI Claims

A red flag should go up whenever claims of success are directly attributed to machine learning. For instance, a recent presenter at a Nashville analytics conference claimed to have saved millions of dollars for a hospital system using AI. While machine learning did evaluate every imaginable metric to measure hospital productivity, according to the presenter, machine learning itself saved lives and money… not the actions taken based on its recommendations. Because of the nebulous nature of AI and the complexities involved in machine learning, some participants may not have appreciated the missing link between the machine learning analysis results and the necessary actions that had to accompany them.

In the case of the presenter’s example, the hospital nurses’ patient response time ostensibly decreased by over 35% due to machine learning. When asked how this improvement occurred, the presenter claimed that machine learning produced an optimal efficiency metric which nurses incorporated into their daily routines. What the presenter left out, however, were insights into how humans fulfilled machine learning’s recommendations. This omission should have at minimum raised red flags for the listeners.

Furthermore, the presenter claimed that AI transformed a declining hospital chain into one of the world’s most efficient and profitable health care institutions by using machine learning algorithms to analyze collective workflows across multiple failing locations. Missing once again, however, was the “how.” Unfortunately, because of the disconnect between AI insights and human actions, the accuracy of the presenter’s claims about AI’s effectiveness is unclear. If the failing hospitals were in fact converted into efficient and profitable sites, the recovery may have resulted from actions unrelated to the AI analyses. As the number of AI providers proliferate, similar claims, obscured by the ill-defined and evolving AI field, become increasingly common.

Making AI Actionable

While the mathematics and computer science skills needed to create machine learning algorithms are highly complex, applying the algorithms industrially is far less involved (especially with pre-trained machine learning procedures). The difficulty lies in solving for the right outcomes and connecting those outcomes with the right levers. An analytic plan involves not only using machine learning to crunch data, but also evaluating how an entity can incorporate the machine learning models’ recommendations into its employees’ daily workflows, interconnected processes, and culture.

To illustrate, consider the nursing example from above. Let’s assume that a machine learning model did indeed optimize the time nurses take to perform major tasks in order to help them utilize their respective schedules more efficiently. Moreover, machine learning suggests that the length of time nurses should take to respond to a patient is X, a 35% decline from the pre-optimized state. That alone is fantastic, but it doesn’t account for the nurse’s choices and the consequences of those choices. To respond to patients more quickly, nurses might decrease the time it takes to perform other tasks or even eliminate some tasks altogether. If time allocations were what was being solved for (the presenter did not define the target or outcome variable), and the new time-allocation recommendations could realistically be implemented, they would have to work in conjunction with other machine learning recommendations that humans would also need to adopt.

With over three decades of advanced analytics experience in Fortune and media companies, I have learned that it’s excruciatingly difficult for people to implement more than two or three major changes to their job specifications at one time. The troubling part of the hospital AI presentation was that the actionable complexity associated with any machine learning project was not discussed. Questions that begged more explanation of how the machine learning results were actually used remained unanswered, hidden in a safe cloud of AI ambiguity. If a machine learning project’s final resting place is in a presentation, versus application in a hospital, manufacturing assembly line, retail store, or software delivery system, it becomes totally self-serving. Fortunately, examples abound of machine learning recommendations that result in actionable outcomes. However, unless a clear explanation is forthcoming about what actions are required (or were taken) to make machine learning actionable, you should maintain a sense of caution when interacting with individuals’ claims about AI’s seemingly mystical powers.

Some researchers have argued that AI is overhyped, while others believe its potential is being underestimated. Regardless, many AI claims don’t stand up to scrutiny. Another issue is that people who work with AI often make bold claims that the general public cannot understand or verify. Machine learning application is difficult; it can’t be done on a whim. If you’re looking to use machine learning in your business, do plenty of research and find a trustworthy team to help you out. Be careful when interacting with companies that promise the world with their AI-powered products; implementing these projects takes more than just flipping a switch.

No comments yet

Types of Machine Learning

April 12th, 2022 by Taymour

Just when you thought you had machine learning (ML) figured out, you find a new type waiting to be discovered. If you’re a marketer or data scientist, staying up-to-date on the different types of machine learning and how they can be used to improve your campaigns and analytics is important. In this blog post, we’ll explore the three main types of machine learning:

Supervised learning
Semi-supervised learning
Unsupervised learning

We’ll also discuss some of the applications for each type of machine learning. So, what are these different types of machine learning? Let’s find out!

Supervised Learning

Supervised learning is when the training data has both the input and output. For example, the training set would include images with a goat and images without a goat, and each image would have features of a goat or something that is not a goat. In this way, you supervise the machine’s learning with both the input (e.g., explicitly telling the computer a goat’s specific attributes: arched back, size and position of head and ears, size and relative location of legs, presence and shape of backward-arching horns, size and presence of a tail, straightness and length of hair, etc.) and the output (“goat” or “not goat” labels), so the machine can learn to recognize a goat’s characteristics. In other words, you’re getting the computer to learn a classification system you’ve created. Classification algorithms can sort email into folders (including a spam folder).

Goat or No Goat

Most regression algorithms are so named for their continuous outputs, which can vary within a range, such as temperature, length, blood pressure, body weight, profit, price, revenue, and so on. The above sales model example would rely on a regression algorithm because revenue—the outcome of interest—is a continuous variable.

Semi-Supervised Learning

Semi-supervised learning occurs when the training data input is incomplete, meaning some portion of it does not have any labels. This approach can be an important check on data bias when humans label it, since the algorithms will also learn from unlabeled data. Gargantuan tasks like web page classification, genetic sequencing, and speech recognition are all tasks that can benefit greatly with a semi-supervised approach to learning.

Unsupervised Learning

Unsupervised learning happens when the training data input contains no output labels at all (most “big data” often lacks output labels), which means you want the machine to learn how to do something without telling it specifically how to do so. This method is useful for identifying data patterns and structures that might otherwise escape human detection. Identified groupings, clusters, and data point categories can result in “ah-ha” insights. In the goat example, the algorithm would evaluate different animal groups and label goats and other animals on its own. Unlike in supervised learning, no mention is made of a goat’s specific characteristics. This is an example of feature learning, which automatically discovers the representations needed for feature detection or classification from raw data, thereby allowing the machine to both learn the features and then use them to perform a specific task.

Another example of unsupervised learning is dimensionality reduction, which lessens the number of random variables under consideration in a dataset by identifying a group of principal variables. For reasons beyond the scope of this paper, sparsity is always an objective in model building for both parametric and non-parametric models.

Unsupervised machine learning algorithms have made their way into business problem-solving and have proven especially useful in digital marketing, ad tech, and exploring customer information. Software platforms and apps that make use of unsupervised learning include Lotame (real-time data management) and Salesforce (customer relationship management software).

Other Machine Learning Taxonomies

Supervised, semi-supervised, and unsupervised learning are three rather broad ways of classifying different kinds of machine learning. Other ways to provide a taxonomy of machine learning types are out there, but agreement on the best way to classify them is not widespread. For this reason, many other kinds of machine learning exist, a small sampling of which are briefly described below:

Reinforcement Learning
The algorithm receives positive and negative feedback from a dynamic environment as it performs actions. The observations provide guidance to the algorithm. This kind of learning is vital to autonomous cars or to machines playing games against human opponents. It is also what most people probably have in mind when they think of AI. “Reinforced ML uses the technique called exploration/ exploitation. The mechanics are simple—the action takes place, the consequences are observed, and the next action considers the results and errors of the first action.”

Active Learning
When it is impractical or too expensive for a human to label the training data, the algorithm can be set up to query the “teacher” when it needs a data label. This format makes active learning a variation of supervised learning.

Meta Learning
Meta machine learning is where the algorithm gets smarter with each experience. With every new piece of data, the algorithms creates its own set of rules and predictions based on what it has learned before. The algorithm can keep improving by trying different ways to learn- including testing itself.

Transduction Learning
Transduction in supervised learning is the process of predictions being made on new outputs, based off both training inputs/outputs as well as newly introduced inputs. Unlike inductive models which go from general rules to observed training cases, transduction instead goes from specific observed training cases to even more specific test cases. This usually incorporates Convolutional Neural Networks (CNNs), Deep Belief Networks (DBN), Deep Boltzman Machine (DBM), or Stacked Auto Coders. Transduction is powerful as it can be used for both supervised and unsupervised learning tasks.
There are many transductive machine learning algorithms available, each with its own strengths and weaknesses. Some of the more popular ones include support vector machines (SVMs), decision trees, and k-nearest neighbors (k-NN). SVMs have been particularly successful in applications such as text classification, image classification, and protein function prediction. Decision trees tend to work well when the data is highly structured (such as financial data). K-NN is another algorithm that is commonly used in transductive settings.for transductive machine learning, as it is relatively simple to implement and can be used for both regression and classification tasks.

Deep Learning
Deep learning is a subset of machine learning that deals with algorithms inspired by the structure and function of the brain. Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. It uses networks such as deep neural networks, deep belief networks, and recurrent neural networks. These have been applied to a variety of fields like computer vision, machine hearing, bioinformatics drug design natural language processing speech recognition and robotics.
The deep learning algorithm has been shown to outperform other methods in many tasks. However, deep learning is also a complex field, and requires significant expertise to design and train effective models. In addition, deep learning algorithms are often resource intensive, requiring large amounts of data and computational power. As a result, deep learning is typically used in combination with other machine learning methods, such as shallow neural networks or support vector machines.

Robot Learning
Algorithms create their own “curricula,” or learning experience sequences, to acquire new skills through self-guided exploration and/or interacting with humans.
One such algorithm, known as Q-learning, is often used in reinforcement learning. Q-learning involves an agent that interacts with its environment by taking actions and receiving rewards or punishments in return. The goal of the agent is to learn a policy that will allow it to maximize its reward.
Q-learning can be used to teach a robot how to perform a task such as opening a door. The robot would first need to be able to identify the door and then figure out how to open it. In order to do this, the robot would need to trial and error different actions until it found the one that worked best.
Once the robot has learned how to open the door, it could then be taught how to close it. This could be done by rewarding the robot for taking the correct action and punishing it for taking the wrong action.
Through Q-learning, robots can learn a wide variety of skills. This type of learning is becoming increasingly important as robots are being used in more complex tasks such as search and rescue, manufacturing, and healthcare.
Ensemble Learning
This approach combines multiple models to improve predictive capabilities compared to using a single model. “Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), decrease bias (boosting), or improve predictions (stacking).”
This approach is often used in data science competitions, where the goal is to achieve the highest accuracy possible. Ensemble machine learning has been used to win many well-known competitions, such as the Netflix Prize and the KDD Cup.
While ensemble machine learning can be very effective, it is important to remember that it comes with some trade-offs. This approach can be more complex and time-consuming than using a single model. In addition, the results of ensemble machine learning can be difficult to interpret.

Today’s powerful computer processing and colossal data storage capabilities enable ML algorithms to charge through data using brute force. At a micro-level, the evaluation typically consists of answering a “yes” or “no” question. Most ML algorithms employ a gradient descent approach (please see a related post Neural Networks (NN Models) as an example of supervised learning .known as back propagation. To do this efficiently, they follow a sequential iteration process.

In lieu of multivariate differential calculus, an analogy may help the reader better understand this class of algorithms. Let’s say a car traverses through a figurative mountain range with peaks and valleys (loss function). The algorithm helps the car navigate through different locations (parameters) to find the mountain range’s lowest point by finding the path with the steepest descent. To overcome the problem of the car getting stuck in a valley (local minimum) that is not the lowest valley (global minimum), the algorithm sends multiple cars, each with the same mission, to different locations. After every turn, the algorithm learns whether the cars’ pathways help or hinder the overarching goal of getting to the lowest point.

More technically, the gradient descent algorithm’s goal is to minimize the loss function. To improve the prediction’s accuracy, the parameters are hyper-tuned, a process in which different model families are used to find a more efficient path to the global minimum. Sticking with the mountain range analogy, an alternative route may fit the mountain topology better than others, thereby enabling a car to find the lowest point faster. Most of these ML models make relatively few assumptions about the data (e.g., continuity and differentiability are assumed for gradient descent algorithms) or virtually no assumptions (e.g., most clustering algorithms, such as K-means or random forests); the essential characteristic that makes it “machine learning” is the consecutive processing of data using trial and error. Stochastic (can be analyzed statistically) models make assumptions about how the data is distributed: “I’ve seen that mountain range before, and I don’t need to send a whole fleet of cars to accomplish the mission.” Related blog post: Statistical Modeling vs. Machine Learning

How iteration can lead to more precise predictions than stochastic methods should make sense to you: The exact routes are calculated rather than estimated with statistics. However, because the loss (a.k.a. cost) function will usually be applied to a slightly different mountain range, the statistical models, which are more generalizable, may ultimately lead to a superior prediction (or interpolation). However, while statistics and non-parametric ML are clearly different, non-parametric ML algorithms’ iterative elements may be used in a statistical model, and conversely, non-parametric ML algorithms may contain stochastic elements.

A fuller understanding of the differences between these machine learning variations and their highly-specific, continuously-evolving algorithms requires a level of technical knowledge beyond the scope of this paper. A more fruitful avenue of inquiry here is to explore the relationship of ML and statistics as they relate to AI.

No comments yet

Why Machine Learning Has Surpassed Statistical Prediction

April 12th, 2019 by Taymour

Business and scientific communities have learned to successfully use both machine learning and statistics for predictive analysis, yet machine learning has increasingly become the preferred method. Before looking at why, it is important to understand how these methods differ. In recent years, it has become increasingly apparent that data scientists tend to favor machine learning over statistics. The prevailing view is that their purposes are different: statistics makes inferences whereas machine learning makes predictions. This difference is evident in the Latin roots of each word. In Latin, prediction derives from praedicere “to make known beforehand” and inference stems from inferentem or “to bring into; conclude, deduce.” A statistical inference is how two or more variables are related. In other words, its purpose is descriptive in that it quantitatively explains some type of a relationship. Machine learning primarily focuses on prediction. Yet, a quantitively defined description is often used, successfully, to make predictions.

To make a head-to-head comparison between machine learning and statistics, it is essential to keep this common purpose in mind. This article highlights some of the distinctions on how predictions are made, employed, and interpreted. It also provides examples of why machine learning is gaining favor in business and scientific applications.

New Technology Put Statistics on the Map

The rise of statistical thinking is a result of the numerous new technologies in the first decade of the 1900s. As desk calculators replaced the early tabulation machinery at the beginning of the twentieth century, more complex calculations like Ordinary Least Squares (OLS) equations could be solved. Throughout the century, statistical thinking based on the mathematics of drawing projectable inferences from a smaller sample continued and expanded rapidly. In turn, improved technology made it possible to process increasingly larger volumes of data faster.

Fast forward a century, modern-day data storage and blazingly fast CPUs / GPUs can process massive amounts of data using statistical methodologies. However, while such horsepower can process samples that approach the population (n->N), the fundamental small-to-large deductive principles that underly statistics remain unchanged from earlier days. While the predictive power of statistics has improved with access to more data and processing power, its predictions do not incorporate data it has not previously encountered; it must rely on how well the sample fits a hypothetical, unknown population. The model’s “fit” is manifested by its “parameter estimates,” which are literally guesses of what the predictive data set is expected to look like. In other words, while the model estimates the parameter of a hypothetical and unknown population, we are assuming that the data set used in the prediction literally refers to this theoretically unknown population.

In contrast, machine learning doesn’t require any assumptions. Starting with a training data set, machine learning then applies the patterns it learned to a predictive data set. Unlike the statistical approach, machine learning refines its prediction by learning from the new data. The more data, the merrier!

Whether one approach results in superior prediction depends largely on the scenario at hand. Understandably, either approach can go awry. In the case of statistics, the sample data may not be representative of the population to be predicted. Similarly, a machine learning training data set may not resemble the predictive data set. In these scenarios, the respective results are inadequate predictions. In the world of big data, however, machine learning generally maintains an advantage in the overall predictive accuracy and precision as it can process more information and deal with greater complexity.

So, What Are the Differences in How Predictions Are Made?

Statistics makes predictions (really inferences used for predictive purposes) about the large from the small. Machine learning, on the other hand, makes predictions about the large from the large. It is important to note that both types of predictions can be delivered at the individual or population levels. Statistics draws inferences from a sample using probability theory. Machine learning uses mathematics as a “brute force” means to make its predictions. As some may expect, because machine learning processes more data iteratively, it tends to be far more computationally demanding than statistics. But this limitation is increasingly diminishing in tandem with the recent explosion of processing power and increased storage capacity.

On the surface, both machine learning and statistics are numerically based. This begs the question: what is the difference between mathematics and statistics? While statistical methods may employ mathematics, their conclusions employ non-mathematical concepts. Because statistics is grounded in probability, uncertainty is rooted in its conclusions versus mathematics, which is precise and axiomatic. Statistics is empirically based deductive logic; mathematics uses formal, inductive logic.

Compared to statistically based prediction methods, machine learning does not make any assumptions about the data. Statistics requires assumptions to be satisfied about the sample distribution, which are not always possible or easy to satisfy. In addition, in statistical analysis, the sample data must be clean and pristine for its estimates to be accurate and precise. Machine learning is less fussy. It can utilize structured, unstructured, or even messy data. While there may be inaccurate or “noisy” data that slips into the machine learning process, the use of larger data sets has the potential to reveal patterns that may otherwise have been lost. A larger data pool generally improves the overall predictive power of the machine learning model.

Interpretability

Statistics is typically more interpretable (answers what and some why questions) than machine learning (answers primarily what questions). For example, a regression model in statistics can give insights into why certain variables are included, such as whether headaches are normally associated with the flu. Statistics tries to prove that headaches are a flu symptom by testing this hypothesis on other flu data sets. Machine learning can plow through large amounts of data to uncover correlations between the flu and other features that happen to be correlated with it in the training data set. In this example, machine learning may confirm headaches as a common symptom of the flu but may also uncover other correlations, such as the lack of sunlight exposure or something less obvious like the per capita mass transit usage. Here, the mass transit usage is not a symptom of the flu but could be a factor that helps explain the flu incidence in a certain region during the winter season. Or, as is often the case with machine learning, it may find a feature that is seemingly unrelated but nevertheless helps its prediction.

On the flip side, statistics can delve deeper into the why questions using marginal and conditional probability distributions, which is currently not possible with machine learning. However, machine learning’s raw predictive power may be valued more than the ability to delve deeper into a subject because correlations can also lead to actionable strategies.

Machine Learning Takes Center Stage

In practice, both statistics and machine learning are used today, and both continue to evolve. However, Google searches for these terms show that machine learning began surpassing statistical analysis in popularity in early 2011 (see Figure 1).

Statistical Analysis vs. Machine Learning (blue)

.Google Trends Index Jan 2004-Sept 2021
Faster and cheaper technology can harness the proliferation of data for both greater profits and for social improvements. Though both statistical modeling and machine learning benefit from these advances, machine learning takes greater advantage because it can process all the data it can get its hands on. The prediction gap is expected to widen as technology continues to improve.

To gain further perspective on why machine learning has started to overtake statistically based methodologies, we asked a data science practitioner to explain how they are using machine learning to solve some of the world’s most challenging problems.

Machine learning and other artificial intelligence techniques allow for proactive rather than reactive decision-making. Related: see blog on neural networks (nn models).

APPENDIX

Differences in Vernacular*

Differences in Methods*

No comments yet

What is Artificial Intelligence?

March 12th, 2019 by Taymour

May 2023 Update

Since the original article was written in 2017, there have been several significant updates in the field of machine learning and statistics. One important development is the increasing use of deep learning, a subfield of machine learning that uses artificial neural networks to model complex data relationships. Deep learning has enabled breakthroughs in fields such as computer vision, natural language processing, and speech recognition, and has opened up new applications in areas like autonomous driving and personalized medicine.

Another trend has been the growing emphasis on interpretability and fairness in machine learning models. As machine learning is used in high-stakes applications like healthcare and criminal justice, there is a growing need to understand how decisions are being made and to ensure that they are not biased against certain groups. To address this, researchers have developed new techniques for visualizing and explaining the inner workings of machine learning models, as well as methods for auditing them for fairness and bias.

In addition, there has been a growing interest in combining machine learning and statistics in what is sometimes called “statistical learning.” This approach seeks to combine the strengths of both fields, using statistical models to make predictions and machine learning techniques to improve their accuracy and scalability. Some researchers have also explored ways to incorporate uncertainty into machine learning models, drawing on probabilistic modeling techniques from statistics to better handle situations where the data is noisy or incomplete.

Overall, the use of machine learning and statistics continues to grow and evolve, with new applications and techniques emerging all the time. As the amount of data being generated continues to increase, and as the need for accurate and interpretable predictions becomes more urgent, it seems likely that both fields will continue to play important roles in the future of data analysis.

Both business and scientific communities have learned to successfully use machine learning and statistics to provide predictive analysis, yet machine learning has increasingly become the preferred analyzation method. Before looking at why, it’s important to understand the difference between machine learning and statistics. To distinguish between the two, it helps to understand why businesses and scientific communities favor machine learning over statistics. The prevailing view is that their purposes are different: Statistics makes inferences, whereas machine learning makes predictions. This difference is evident in each word’s Latin roots. In Latin, prediction derives from praedicere, which means “to make known beforehand” and inference stems from inferentem, or “to bring into; conclude, deduce.” A statistical inference deals with how two or more variables are related. In other words, its purpose is descriptive, in that it quantitatively explains some type of a relationship. Machine learning, on the other hand, is primarily focused on prediction.

However, a quantitatively-defined description is often successfully used to make predictions. To make a head-to-head comparison between machine learning and statistics, it is essential to keep this common purpose in mind. We’re seeking to highlight some of the distinctions regarding how predictions are made, employed, and interpreted. This article provides several examples of why machine learning is gaining favor in business and scientific applications.

***Fun fact, the above update was written by Chat GPT***

New Technology Put Statistics on the Map

The rise of statistical thinking is a result of numerous new technologies that appeared on the scene in the first decade of the 1900s. As desk calculators replaced the early tabulation machines at the beginning of the twentieth century, they were able to solve more complex calculations like Ordinary Least Squares (OLS) equations. Throughout the century, statistical thinking based on the mathematics of drawing projectable inferences from a smaller sample continued and expanded rapidly. In turn, improved technology made it possible to process increasingly large volumes of data faster.

Fast forward a century. Modern-day data storage and blazingly fast CPUs/GPUs can process massive amounts of data using statistical methodologies. However, while such horsepower can process samples that approach the population (n N), the fundamental small-to-large deductive principles that underlie statistics remain unchanged from earlier days. While statistics’ predictive capacity has improved with access to more data and processing power, its predictions do not incorporate data it has not previously encountered; it must rely on how well the sample fits a hypothetical, unknown population. The model’s “fit” is manifested by its “parameter estimates,” which are literally guesses of how the predictive dataset is expected to look. In other words, while the model estimates a hypothetical and unknown population’s parameter, we assume the dataset used in the prediction literally refers to this theoretically unknown population. Machine learning, however, doesn’t require any assumptions. Instead, it starts with a training dataset and then applies the patterns it learned to a predictive dataset. Unlike the statistical approach, machine learning refines its prediction by learning from the new data.

Whether this approach results in superior prediction depends largely on the scenario at hand. Understandably, either approach can go awry. In the case of statistics, the sample data may not represent the population to be predicted. Similarly, a machine learning training dataset may not resemble the predictive dataset. In these scenarios, the respective results are inadequate predictions. In the world of big data, however, machine learning generally maintains an advantage in overall predictive accuracy and precision, as it can process more information and deal with greater complexity.

What Are the Differences in How Predictions Are Made?

Statistics makes predictions (really inferences used for predictive purposes) about the large (a lot of data) from the small (sample data).. Machine learning makes predictions about the large from the large. It’s important to note that both types of predictions can be delivered at the individual or population levels. Statistics draws inferences from a sample using probability theory, whereas machine learning uses mathematics as a “brute force” means to make its predictions. As one might expect, because machine learning processes more data iteratively, it tends to be far more computationally demanding than statistics. However, this limitation is increasingly becoming less of an obstacle with the recent explosion of processing power and increased storage capacity.

On the surface, both machine learning and statistics are numerically based, which begs the question: What is the difference between mathematics and statistics? While statistical methods may employ mathematics, their conclusions employ non-mathematical concepts. Because statistics is grounded in probability, uncertainty is rooted in its conclusions, whereas mathematics is precise and axiomatic. Statistics is empirically-based, deductive logic; mathematics uses formal, inductive logic.

Compared to statistically-based prediction methods, machine learning doesn’t make any assumptions about the data. Statistics require sample distribution assumptions to be satisfied, which is not always possible or easy to do. In addition, in statistical analysis, the sample data must be clean and pristine for its estimates to be accurate and precise. Machine learning, on the other hand, is less fussy. It can utilize structured, unstructured, or even messy data. While inaccurate or “noisy” data may slip into the machine learning process, using larger datasets has the potential to reveal patterns that may otherwise have been lost. A larger data pool generally improves the machine learning model’s overall predictive power.

Interpretability

Statistics is typically more interpretable (it answers what and some why questions) than machine learning, which answers primarily what questions. For example, a statistics regression model can give insights into why certain variables are included, such as whether headaches are normally associated with the flu. Statistics tries to prove that headaches are a flu symptom by testing this hypothesis on other flu datasets. Machine learning, however, can plow through large amounts of data to uncover correlations between the flu and other features that happen to be correlated with it in the training dataset. In this example, machine learning may confirm headaches as a common symptom of the flu, but it may also uncover other correlations, such as the lack of sunlight exposure or something less obvious like the per capita mass transit usage. Of course, mass transit usage is not a flu symptom, but it could be a factor that helps explain the flu incidence in a certain region during the winter season. Or, it may find a factor that isn’t open to explanation but nevertheless helps its prediction. Using marginal and conditional probability distributions, statistics can delve deeper into why questions, which is currently not possible with machine learning. However, raw predictive power may be more valuable than the ability to delve deeper into a subject when correlations also lead to actionable strategies.

Machine Learning Takes Center Stage

Both statistics and machine learning are used today, and both continue to evolve. However, Google searches for these terms show that machine learning began surpassing statistical analysis in popularity in early 2011 (see Figure 1).

Statistical Analysis vs. Machine learning January 2004 to May 2021

Faster and cheaper technology can harness data proliferation for both greater profits and social improvements. Though both statistical modeling and machine learning benefit from these advances, machine learning takes greater advantage, as it can utilize all the data it can access. The implication is that the prediction gap will widen as technology continues to improve.

Machine learning and other artificial intelligence techniques allow for proactive rather than reactive decision-making. In healthcare, this approach translates to significant savings through cost avoidance and reduction. Across Decode Health’s use cases, we know that it also saves lives by leading to better decisions and earlier interventions.

A (non-exhaustive) list of differences in vernacular are presented below:

Difference in ML vs Statistical Methodologies

A (non-exhaustive) list of differences in methods are presented below:

Machine Learning VS Statistical Prediction

Demystifying the Move to Machine Learning

While it isn’t exactly right to say that machine learning is superior to statistical analysis, several factors have made it more reliable in terms of making predictions. Perhaps the greatest shift drivers are the recent advances in processing power, which make it possible for machine learning to churn larger datasets iteratively, yielding better predictions than statistics. The fact that machine learning models don’t make assumptions about the data means they tend to be more reliable as well.

One of machine learning’s greatest strengths is that it can adapt to data it has not encountered, which means it can make predictions about something new. This outcome isn’t possible with statistical analysis. Additionally, advances in machine learning are making the models more interpretable, a factor that was once one of the statically-based model’s advantages. Since machine learning can handle and exploit an increasing amount of unstructured, “messy” data, it requires less work to prepare for analysis.

Ultimately, machine learning provides a much faster, more accurate way of working with large amounts of data, a feature that’s more in demand with the rise of big data. Sometimes, statistical analysis remains the better option, but machine learning can make predictions with less data cleanup, so businesses can make decisions faster and scientific communities can start understanding the data and its patterns sooner.

No comments yet

Climate Change’s Big Impact on Big Data

March 12th, 2015 by Taymour

The problem inherent in climate change studies is not the amount of data to manage, the technology trying to tame that data, or even the people who use the technology. Science (or more acutely, certain populations’ unwillingness to believe in science) actually poses the greatest challenge to big data. According to the Pew Research Center’s most recent poll, only 40% of Americans believe that global warming is primarily caused by human activities that pump excessive amounts of C02 into the atmosphere. If you belong to this minority, keep reading. If you do not belong to this minority, definitely keep reading.

Science Skepticism

Those who understand the scientific community have a sense of how elusive consensus among scientists can be. By its very nature, science is based on debunking science with better science. Thus, to have an overwhelming majority consensus among the world’s scientists is extremely significant. To be sure, ample room is still left for debate. That debate, however, is more focused on the magnitude of climate change’s impact than its origins.

For this article, I will give corporate America the benefit of the doubt by assuming that it is more scientifically inclined than the population at large (allotting for variations based on industry, geography, and other factors). However, this paper isn’t trying to accurately quantify business’s scientific inclination; instead, it’s about the alarming proportion of science naysayers (both active and passive) who undermine data-driven, actionable insights. Based on my own experiences with a variety of businesses, I estimate that about half the people in corporate America fall into this category.

If I’m correct, then one in every two businesspeople in America disregards data regardless of how reasonable or compelling its conclusions are. These people trust their guts more than any scientific consensus. Sadly, they often dismiss reason and empirically-derived conclusions even before considering any analysis. If they do eventually consider such results, asking them to give the matter any subsequent validation or further analysis seems to be asking too much. These trends reflect a different enemy of science: the irrepressible urge for short-term results. The data-driven professional understands that the initial pass at the data may have been incomplete. Hypotheses must be revised. Assumptions must be revisited. New data sources must be considered. Qualitative opinions must be factored in. In other words, science is about continuous improvement in the search for answers. When an answer is found, science immediately attempts to debunk it with whatever methods make sense. If the debunking effort fails, the answer will remain as the best conclusion, at least for the moment. The data scientist will not dismiss an answer just because it is a non-quantifiable, emotional response. Science will accept an answer as the best one only if various means to disprove it have been extensively explored. In this sense, science is a non-judgmental, “show-me-what-you’ve-got” discipline.

Indeed, other non-quantifiable attributes should be considered in lieu of or in addition to the data-driven methodologies. In fact, any conclusion should be treated with skepticism, be it data-driven or not—such is the way of science. However, we must discourage knee-jerk rejections of the empirical approach merely because it doesn’t “feel” right. A blanket policy of skepticism toward testing, profiling, modeling, and other forms of data analysis is not helpful.

Gut Instinct vs. Data in Corporate America

Data can actually be used to validate hunches and predict events using empiric methodologies. In the end, the data-driven approach employs the tried-and-true scientific method in an attempt to augment, if not supersede, what we think is true. Inevitably, some results will seem counterintuitive to conventional corporate wisdom, which is the fun part for data scientists (that is, unless political motivations lurk behind the scenes). Corporate America’s resistance to big data seems to hold its practitioners back from being fully integrated into mainstream corporate functions—a sad state of affairs, given what big data has to offer.

If my estimated 50-50 ratio of science supporters to science naysayers is even remotely accurate, that’s a big problem. Because we can prove whether a particular idea pans out by using a well-designed statistical test, any arguments against taking the actions indicated by the results should be based on data flaws or problems with the methods used to analyze it. They should not be based on an individual’s intuition.

The Best Challenge to Science is Better Science

Succinctly put, one should only challenge science with better science. In some cases, the data and what it’s measuring may be mismatched. Or, the science naysayer may argue that some human behavior is just not conducive to measurement. Even so, a dialogue about how to improve data and analytical practices is far more helpful than a blatant rejection of their merit. That dialogue’s merit will depend on how the parties involved value a data-driven, scientific approach. So long as some people continue to resist the broad scientific consensus about climate change, you can be sure data scientists will remain undervalued members of corporate America.

No comments yet

Machine Learning

AI Ambiguity: Flipping the Switch

Actionable Levers

AI Claims

Making AI Actionable

Types of Machine Learning

Supervised Learning

Semi-Supervised Learning

Unsupervised Learning

Other Machine Learning Taxonomies

Reinforcement Learning

Active Learning

Meta Learning

Transduction Learning

Deep Learning

Robot Learning

Ensemble Learning

Why Machine Learning Has Surpassed Statistical Prediction

New Technology Put Statistics on the Map

So, What Are the Differences in How Predictions Are Made?

Interpretability

Machine Learning Takes Center Stage

APPENDIX

Differences in Vernacular*

Differences in Methods*

What is Artificial Intelligence?

May 2023 Update

New Technology Put Statistics on the Map

What Are the Differences in How Predictions Are Made?

Interpretability

Machine Learning Takes Center Stage

Statistical Analysis vs. Machine learning January 2004 to May 2021

Demystifying the Move to Machine Learning

Climate Change’s Big Impact on Big Data

Science Skepticism

Gut Instinct vs. Data in Corporate America

The Best Challenge to Science is Better Science

Quick Links

Contact Info

Stay Connected