Here’s your chance to prove what you learned
Ready? Let’s test your Day-14 knowledge
Question 1 of 8
What characterizes an unbalanced dataset?
Equal distribution of all classes
One class significantly outnumbering others
Only two classes present in the data
All features having the same scale
Question 2 of 8
In the bank fraud detection example, which of the following represents the minority class?
Legitimate transactions
Fraudulent transactions
Bank accounts
Transaction amounts
Question 3 of 8
What is a primary challenge posed by unbalanced data in machine learning?
Increased computational cost
Difficulty in data collection
Model bias towards the majority class
Inability to use certain algorithms
Question 4 of 8
Why is addressing unbalanced data particularly important in fraud detection scenarios?
It makes the model run faster
It reduces the need for data collection
It ensures equal representation of all transaction types
It helps in accurately identifying rare but critical fraudulent cases
Question 5 of 8
Which of the following is NOT a resampling technique for handling unbalanced data?
Oversampling
Undersampling
SMOTE
Class Weighting
Question 6 of 8
What is the main difference between SMOTE and ROSE techniques?
SMOTE only works with numerical data, while ROSE works with categorical data
SMOTE generates exact copies of minority instances, while ROSE creates new variations
ROSE is only applicable to binary classification problems, while SMOTE works with multi-class problems
SMOTE is an undersampling technique, while ROSE is an oversampling technique
Question 7 of 8
In the context of unbalanced data, what does a hybrid approach refer to?
Combining supervised and unsupervised learning methods
Using both numerical and categorical features in a model
Applying both oversampling and undersampling techniques
Mixing different machine learning algorithms
Question 8 of 8
When would class weighting be particularly useful in handling unbalanced data?
When you want to create new synthetic examples
When you can't modify the original dataset
When dealing with time-series data
When you need to reduce the overall size of the dataset