With the recent surge of AI models, from ChatGPT to Google's latest search-embedded generative AI, this technology is here to stay. Looking ahead, AI systems will become industry staples, a standard tool used by every company, home, and individual, akin to the ubiquity of computers today. AI proficiency will probably be the next set of skills, similar to Microsoft Office skills, that you see required in everyday job postings. With the speed at which this technology is advancing, it's essential to take a step back and look at what these models actually are and why they're so useful!

The domain of AI

Before delving into the world of AI, it's crucial to establish clarity in our terminology. The terms Machine Learning (ML), Artificial Intelligence (AI), and Deep Learning are often used interchangeably, leading to confusion. In essence, AI serves as an umbrella term, akin to "Biology," which encompasses a wide range of subdomains and techniques. AI is frequently employed as a buzzword to secure funding, but it represents a broad and multifaceted field. Within the realm of AI, Machine Learning stands out as its crown jewel. Machine Learning itself comprises a diverse array of techniques, each with its unique characteristics and applications. One of the most renowned and potent branches of Machine Learning is Deep Learning, also known as neural networks. Deep Learning techniques have gained widespread recognition for their ability to handle complex tasks and are at the forefront of many AI breakthroughs.

Here's a helpful image to visualize this hierarchy: Machine Learning encompasses a multitude of techniques, including Decision Trees, Linear Regression, Support Vector Machines (SVM), and Deep Learning (just to name a few). Deep Learning is a field of its own and stands out as one of the most successful ML algorithms. Within Deep Learning, you'll find a wide range of approaches, from Convolutional Neural Networks (used in imaging) to Transformers (applied to language tasks). However, everything I just mentioned falls under the same umbrella term: ML. But how do ML models work, and what do these algorithms do?


What does Linear regression and Deep Neural Networks have in common?

Spoiler: Linear Regression and Deep Neural Networks are both machine learning algorithms. But what does something you might have studied in high school have to do with the architecture behind ChatGPT? Essentially, these two algorithms are trying to accomplish the same task, albeit in very different ways.

The goal behind both these techniques, as well as every other ML technique you will come across, is to estimate or ‘Learn’ a mapping of a given dataset. In other words they aim to find the function \(f(x)\) that describes your data. More formally, given the input data \(X\) and the output data \(Y\) find me an \(f(X)\) that best describes that mapping. The more data points you provide, the better your estimate.

Thats it! That’s literally all these models are doing. Well, okay, there’s a lot more that goes into it, but at a high level, that’s the core concept. At its essence, machine learning is a statistical technique. Give me a bunch of points and the algorithm will try and find you the best function that describes/maps those data points.

I won’t be delving into the workings of linear regression or neural networks in this post. Those topics are extensive and deserve their own dedicated discussions, and you can easily find countless resources about them online. Instead, we’ll focus on what they accomplish without getting into all the technical intricacies of ‘how’ they achieve it. By the end of this post, I aim to demonstrate why these techniques are highly valuable and prevalent across various domains.

Now, let’s start with a quick refresher on what mathematical functions are.

Functions

Here’s the definition of a function i shamelessly stole from wikipedia.
In mathematics, a function from a set \(X\) to a set \(Y\) assigns to each element of \(X\) one element of \(Y\). The set of \(X\) is called the domain of the function and the set \(Y\) is called the codomain or range of the function.

So in layman terms, you can think of a function as a sort of machine. You give it a certain input, \(x\) it performs certain operations on this input, and then it produces an output \(y\). It’s also worth noting that almost all of scientific theories are built upon functions. From quantum mechanical descriptions to tumor growth rates.

Here’s an example of a simple function: \(f(x) = x^2\)
In this function, whatever number you input, the function will square that number, resulting in the corresponding answer. \(2 \rightarrow f(2) \rightarrow 4\)

Functions are usually derived through empirical observations or developed mathematically based on a set of rules. For instance, Einstein derived his famous equation, \(E=mc^2\) by establishing the speed of light as a constant. And voilà, you now have a function that describes the relationship between mass and energy. When you input a value for the mass, you can obtain the corresponding energy value, or if you reverse the process and input energy, you’ll get the mass value.

But as mentioned earlier, equations can also be derived solely based on empirical observation, coupled with a bit of intuition, of course. For instance, when you aim to model population growth, you examine your data and formulate an equation that best fits the population data you’ve collected. This approach can be more challenging because it often involves a considerable amount of trial and error, and it can lead to biased estimations based on the sample data you have on hand (foreshadowing much?) That is not to say that deriving equations is easier. There’s a lot of trial and error there too and those theories are also biased in their own way. We don’t need to look too far for an example; consider Newton’s work on gravity. His equations could not describe pluto’s orbit or the bending of light due to gravitational affects. Newton simply didn’t have access to the data needed to test his theory and had to work with the observations of an apple and the moon.

Hunting for Hidden Formulas

Here, let’s try to derive an equation from empirical observation. I will give you a bunch of data points, and you find the function (pattern)!

\(X\) \(Y\)
1 1
2 8
3 27
4 64
\(\vdots\) \(\vdots\)


You probably guessed it, the function is

Click to reveal answer $$f(x) = x^3$$ you simply multiply the number by itself three times.



Good, let’s complicate our example. This time, our function takes two positive inputs. To simplify matters, we’ll set the second input (\(Z\)) as a constant value. This function may remind you of something you encountered in high school physics. Your task is to work with this function with the constant Z and determine the relationship between the two variables that best describes it.

\(X;Z\) \(Y\)
1 ; 9.8 9.8
10 ; 9.8 98
4 ; 9.8 39.2
7 ; 9.8 68.6
\(\vdots\) \(\vdots\)


This was also probably a straight giveaway from the first input.

Click to reveal answer You multiply the two numbers together. looks a bit like Newtons second law, no? $$F=ma$$ Just set "z" to 9.8 (acceleration of Earth's gravity) and let the input x represent the mass of the object.



I’ve been providing relatively straightforward examples so far, with inputs given in a specific order and corner cases to aid in deduction. Now, let’s consider a different scenario. I’ll present a function that takes positive inputs, but this time, the data isn’t sorted, and I won’t offer any obvious hints. Your challenge is to determine the function that most accurately represents this data.

\(X;Z\) \(Y\)
7 ; 2 7.82
3 ; 9 9.42
5 ; 6 7.81
4 ; 8 8.94
2 ; 2 2.83
10 ; 15 18.03
8 ; 9 12.04
3 ; 7 7.62
\(\vdots\) \(\vdots\)


If you’ve discovered this, I salute you; if you haven’t, I don’t blame you. Finding this function might have taken some time. The function in question is

Click to reveal answer none other than the famous Pythagorean theorem $$x^2 + z^2 = y^2$$



The exercises I’ve presented so far were merely illustrative examples. None of these functions necessarily had to be derived in this particular manner. However, consider what happens when we encounter a scenario where we genuinely need to do this. What if we possess a dataset of a specific function but lack any insight into the actual function itself? You can appreciate how challenging this can become for us, ordinary humans.

Let’s explore a real-life situation where this challenge is indeed applicable: image classification! Here’s the scenario: can you provide me with a function that takes as input an image containing either a cat or a dog and outputs either 0, to signify a cat, or 1, to denote a dog? An image is essentially represented mathematically as an array (flattened matrix) that encodes pixel values. However, you don’t need to focus on the technical details of this representation. The critical point here is that our function takes an input matrix, denoted as A, which describes the pixel content of the image.

For clarity, let’s consider \(A_1\) as the matrix representing a dog image and \(A_0\) as the matrix representing a cat image. Your task is to provide the function that best captures the relationship between these input matrices to distinguish between cat and dog images accurately.

\(X\) \(Y\)
\(A_0\) 0
\(A_1\) 1


This is already getting into the realm of impossibility and its even more challenging when we consider that we’ve only had to deal with two inputs and two outputs. Now, picture the complexity of classifying ten different animals, each captured from slightly different angles.

This is where the value of ML models truly shines. Each ML model offers a unique representation of the underlying function. For example, deep neural networks discover weight matrices, while K-Means Clustering identifies clusters in a hyperspace. Regardless of the specific technique, these models perform numerous calculations that would likely have taken you years to do yourself to manually derive arbitrary set of representations to effectively describe the data.

In today’s data-driven world, the significance of machine learning cannot be overstated. As we grapple with increasingly large and intricate datasets, machine learning becomes our compass, guiding us through uncharted territories of information. It accelerates discovery, transforming mountains of raw data into valuable insights. Whether it’s advancing medical research by identifying new treatment pathways, enhancing customer experiences through personalized recommendations, or predicting market trends, machine learning has become an indispensable tool across diverse industries.