Mastering the Art of Probability Distributions for Successful Machine Learning

data science Oct 09, 2023
thumbnail image for probability distribution blog, from BigDataElearning

Hey there! Are you just starting out in data science and feeling overwhelmed by what is probability distribution and wondering what are the different types of probability distributions? 


You're not alone! 

Many people in your shoes wonder: what are the most commonly used probability distributions in data science and how can I figure out which one to use for my analysis?

Data Science Explained In 20 Infographics

“Data Science Made Simple: Learn It All Through 20 Engaging Infographics (Completely Free)"


If these questions are on your mind, don't worry - you're on the right track! 

Let's explore below topics together and see if we can demystify probability distribution in data science.

 

 

What is a probability distribution?

Probability distribution refers to the way in which the possible outcomes of a particular event are distributed or spread out in terms of their likelihood or probability.

For example, let's say you're rolling a six-sided die. 

Since each side of the die has an equal chance of landing face-up, the probability distribution for rolling the die is even. This means that each outcome has the same probability of occurring, which is 1/6 or about 17%.


In other words the probability of the die landing “1” as face-up is β…™. Similarly the probability of the die landing “2” as face-up is β…™ and it goes on. 

So if you roll the die many times, you can expect each outcome to occur roughly the same number of times.

"That's the beauty of probability distribution - it helps us understand the likelihood of different outcomes and how they're distributed, even if we can't predict the exact outcome of a single event."

Now, imagine you're flipping three coins at the same time. Each coin can land either heads or tails, giving you eight possible outcomes. These outcomes can be thought of as a probability distribution, where each outcome has a certain probability of occurring.



For instance, the probability of all three coins landing heads is 1 out of 8, since there is only one way for this outcome to happen out of eight possible outcomes. 

HHH - All heads

 Similarly, the probability of getting two heads and one tail is 3 out of 8, because there are three ways for this outcome to occur out of eight possible outcomes. 

2 Heads & One Tail

HHT

HTH

THH

The probability of getting one head and two tails is 3 out of 8

1 Head & 2 Tails

TTH

THT

HTT

 The probability of getting three tails is 1 out of 8.

 TTT - All Tails

You can think of the probability distribution of flipping three coins as a bar graph, with the different outcomes on the x-axis and their corresponding probabilities on the y-axis.

This graph shows that some outcomes are more likely to occur than others, with the outcome of two heads and one tail being the most likely, and the outcomes of all three heads or all three tails being the least likely.

What is the significance of probability distribution in machine learning?

Well, probability distribution in machine learning help us make informed decisions and predictions. 

 By understanding the probabilities associated with different outcomes, we can assess risks, estimate chances, and even figure out the best course of action in some situations. 

 They provide a framework for dealing with uncertainty and give us a way to quantify our knowledge about the world.

Two Types of Probability Distribution

There are two main types of probability distributions: 

Let's dive into each one to get a better understanding.


What is Continuous probability distribution: Properties and Examples

Imagine you're at a fruit market, eyeing those juicy watermelons. 

Now, when you reach for a watermelon, you know it's going to be a bit of a mystery. Why?
Because watermelons have this fascinating quality: their size can vary a lot!

Think of continuous probability distribution as reaching for a watermelon. 

As you extend your hand, you never quite know what you're going to get. You might end up with a small watermelon that fits perfectly in your fridge, or you might stumble upon a gigantic watermelon that needs a whole shelf to itself. The possibilities are endless!



Each watermelon represents a different value, just like each data point in the distribution.

So when you pick a new watermelon it might not be of a size measurement that you already recorded from the past list of watermelons, the new measurement could be completely a new value. 

For e.g. one watermelon could be 30.7 inches in circumference and another could be 31.2 inches.  There is no guarantee that another watermelon is going to fit into only a predefined list of measurements. It could be any measurement.

Since the possibilities are endless, it is called continuous probability distribution.

Bell Curve or Gaussian Distribution

The bell curve, also known as the normal probability distribution, is a common visual representation used in continuous probability distributions.

On the same note, the Gaussian probability distribution, often referred to as the bell curve, characterizes continuous probability distributions with its symmetrical shape resembling a bell.

So, when you think about a continuous probability distribution, picture yourself standing amidst a bunch of watermelons, where the size possibilities are endless. 

Now that you have a juicy grasp on the continuous probability distribution concept, let’s see what discrete probability distribution means.

What is Discrete probability distribution: Properties and Examples

Discrete probability distribution is nothing but a concept where the possibilities are finite. For e.g. going back to our example of rolling a six-sided die.  We saw that there are 6 possible outcomes from rolling a die, right? 

Since there are only 6 finite outcomes, it is called discrete probability distribution.

Similarly, our other example of flipping three coins at the same time is also an example of discrete probability distribution, the reason being it has only 8 possible outcomes.

Bernoulli Distribution

The binomial probability distribution models events with only two possible outcomes and is commonly utilized in scenarios like Bernoulli trials, such as coin tosses or pass/fail experiments. It helps us understand the likelihood of success or failure in a single experiment or trial.

The outcome of each time a coin is flipped follows a Bernoulli distribution because it has two possible outcomes, success or failure, and each outcome has a fixed probability associated with it.

For example, if you flip the coin ten times and record the outcomes, you might end up with something like this:

H, T, T, H, T, H, H, T, H, H

This is an example of Bernoulli distribution



Now that you have got a hold of the idea that “continuous probability distribution” can have a range of values and “discrete probability distribution” can have only discrete values, let’s look at the standard deviation formula.

Standard Deviation formula

In our watermelon analogy if you want to know how much they vary in size. The standard deviation can help you with that!

Think of the average size of the watermelons as the "typical" or "average" size that you'd expect. 

The standard deviation tells you how much the other watermelons differ in size from that average.

If the standard deviation is small, it means the watermelons are pretty similar in size. They're close to the average, so you won't find many big differences among them. It's like having a pile of watermelons that are all roughly the same size, with just a little bit of variation.

But if the standard deviation is large, it means the watermelons differ a lot in size. You'll see a wider range of sizes among them. It's like having a mix of large, medium, and small watermelons all jumbled together in the pile. 

Some might be significantly bigger or smaller than the average.

So, the standard deviation helps you understand the "spread" or "variability" in the sizes of the watermelons. 

A small standard deviation means the watermelons are more uniform in size, while a large standard deviation indicates a greater range of sizes and more significant differences among them.

By using the standard deviation, you can quickly assess how much the watermelons deviate from the average size.

You subtract the mean from each data point, square the result, sum up all the squared values, divide by the total number of data points, and finally, take the square root of the result.

The Data Science Aspirant's 90-Day Proven Roadmap

Get INSTANT ACCESS to This Proven Roadmap To Become a Data Scientist in 90 Days,

Even Without Prior Data Science Experience - Guaranteed.

 


Conclusion

Let's summarize what we've learned about probability distributions and standard deviation:

  • Probability Distributions: There are two main types—continuous and discrete. Continuous ones have endless possibilities, while discrete ones have a limited number of outcomes.
  • Continuous Distribution: Like different sizes of watermelons, they show endless potential outcomes, just like continuous probability distributions.
  • Discrete Distribution: Rolling a die demonstrates a set number of outcomes, which is like discrete probability distributions.
  • Bell Curve and Bernoulli Distribution: The bell curve helps us visualize continuous probability distributions, and the Bernoulli distribution models events with two possible outcomes, like in discrete probability distributions.
  • Standard Deviation Importance: We've also seen how standard deviation helps measure the spread of outcomes from the average using a formula.
  • Why It Matters: Understanding these differences in probability distributions and knowing about standard deviation helps us better understand probabilities in different situations where analysis is needed.

Question For You

Consider the height of students in a classroom. Is this random variable continuous or discrete?

A) Continuous

B) Discrete

Tell me in the comments!

Stay connected with weekly strategy emails!

Join our mailing list & be the first to receive blogs like this to your inbox & much more.

Don't worry, your information will not be shared.