You want to display the distribution of students' scores in a class. Which plot is best suited for this task?
- Bar Plot
- Histogram
- Scatter Plot
- Line Plot
Which of the following provides the best explanation of Machine Learning?
- Machine learning learns from labelled data
- Machine learning is the field of giving robots the ability to act intelligently.
- ML enables computers to learn without explicit programming
- Machine learning is the science of programming computers.
What is the main purpose of regression analysis in machine learning?
- To classify data into distinct categories.
- To predict a continuous outcome or value.
- To perform clustering on the dataset.
- To visualize high-dimensional data.
Which of the followings is/are true about classification?
- Classification can be defined as a predictive model mapping inputs to discrete outputs
- Class label prob. enables classification algos to predict continuous values.
- A classification algorithm can have both discrete and real-valued input variables.
- All of the options
A property dealer has a dataset consisting of features like area, price, etc. Now a customer comes to him asking for a property with a certain number of rooms.
Which kind of machine learning technique should the property dealer use from the following?
- Classification
- Clustering
- Regression
You have a dataset of customer feedback comments, and you want to categorize them into different topics, such as product quality, customer service, and delivery. Is this a supervised or unsupervised learning problem?
- Supervised learning
- Unsupervised learning
You are working on a project to predict stock prices based on historical market data, and you have a dataset with features such as past stock prices, trading volume, and other market indicators. Is this a supervised or unsupervised learning problem?
- Supervised learning
- Unsupervised learning
Your supervisor asks you to create a machine learning system that will help your human resources department classify jobs applicants into well-defined groups. What type of system are you more likely to recommend?
- an unsupervised machine learning system that clusters together the best candidates.
- you would not recommend a machine learning system for this type of project.
- a supervised machine learning system that classifies applicants into existing groups.
What do you think about the nature of Car Resale price prediction?
- Regression
- Classification
- Clustering
How do you think we should handle the large number of categories in make and model column?
- One Hot Encoding
- Label Encoding
- Target Variable Encoding
If your data contains d features, how many dimensions will be required to fit the hyperplane through that data?
- D
- D + 1
- D - 1
- 2 * D
In linear regression, if the MSE value is 0, it indicates:
- The predicted values perfectly match the actual values.
- The model has no predictive power and fails to explain the dependent variable.
- The model has high bias and underfits the data.
- The model has high variance and overfits the data.
In a multiple linear regression with five features, the coefficient of determination R2 is found to be 0.85. What does this value indicate about the model's performance?
- The model explains 85% of the variation in the target variable
- The model's predictions are 85% accurate
- The model has an 85% probability of making correct predictions
- The model is 85% confident in its predictions
Feature importance in linear regression is determined by :
- The magnitude of the regression coefficients.
- The number of observations in the dataset.
- The correlation between the independent variables.
- The average squared difference between the predicted and actual values.
When assessing model interpretability in Linear Regression, what is the impact of feature scaling?
- Feature scaling does not affect model interpretability
- Feature scaling improves model interpretability
- Feature scaling can help compare the magnitudes of different coefficients
Consider the following Linear Regression model equation:
y = 5.2x1 - 3.8x2 + 2.1x3 + 0.01x4 - 1.5
if we were to drop one feature, which one would be the best choice ?
- x1
- x2
- x3
- x4
In gradient descent, what does the gradient represent ?
- The direction of steepest increase of the cost function
- The direction of steepest decrease of the cost function
- The number of training examples in the dataset
- The number of layers in the neural network
What is the objective of Gradient Descent in linear regression?
- Minimize the absolute error
- Minimize the squared error
- Maximize the R-squared score
- Maximize the accuracy
What will happen if we add the value instead of subtracting from the original weight?
- The model will diverge instead of converging towards the optimal solution.
- The model will convergse, but very slowly
- The model will converse very fast
What happens if the learning rate in gradient descent for linear regression is set too large?
- The algorithm will converge faster to the optimal solution.
- The model will overfit the training data, leading to poor generalization.
- The algorithm may fail to converge, and the coefficients may oscillate or diverge.
- The cost function will be overestimated, resulting in an inflated R2 score.
Which variant of Gradient Descent uses the entire dataset to compute the gradient at each iteration?
- Mini-batch Gradient Descent
- Stochastic Gradient Descent
- Batch Gradient Descent
- Regularized Gradient Descent
How does the adjusted R2 score differ from the regular R2 score?
- The adjusted R2 score accounts for the number of predictors in the model.
- The adjusted R2 score is always higher than the regular R2 score.
- The adjusted R2 score considers only the explanatory power of the model.
- The adjusted R2 score is not influenced by the sample size.
In adjusted R-squared, what is the range of possible values?
- 0 to 1
- -infinity to 1
- -1 to 1
- -infinity to infinity
What does a higher Adjusted R-squared value indicate about the regression model?
- The model is a perfect fit to the data.
- The model is overfitting the data.
- The model explains more variance in the dependent variable.
- The model has high bias.
Why might the adjusted R2 score be considered more reliable than R2 when adding more predictors to a model?
- Because it always increases with more predictors.
- Because it penalizes the model for adding predictors that don't improve the model.
- Because it is easier to calculate.
- Because it always equals the R2 score..
A regression model with 3 predictors has an R2 of 0.85. After adding a 4th predictor, the R2 increases to 0.86 but the adjusted R2 decreases. What can be inferred?
- The 4th predictor improved the model significantly.
- The adjusted R2 is incorrectly calculated.
- The 4th predictor did not add meaningful information to the model.
What does it mean if the residuals versus predicted values plot shows a nonlinear pattern?
- The model has perfect prediction accuracy.
- The residuals are normally distributed.
- The assumption of linearity is violated.
- The model suffers from multicollinearity.
How does multicollinearity affect regression analysis ?
- It reduces the interpretability of regression coefficients.
- It increases the accuracy of the regression model.
- It improves the goodness-of-fit of the regression model.
- It has no impact on the regression analysis.
A clothing store wants to predict sales based on factors like price, promotions, and store location. Which assumption of linear regression is important for accurate sales predictions?
- Linearity between the independent variables and sales.
- Normal distribution of sales.
- MultiCollinearity among features
- All of them
While building a risk prediction model for loan defaulters, it was observed that the erros were right skewed. Does this imply anyway that the linear regression model is inaccurate?
- Yes, since the features are multi-collinear
- Yes, since the errors aren’t normaly distributed
- Yes, by violation of assumption of linearity
- No, the model may be accurate.
Which diagnostic plot can be used to detect heteroscedasticity?
- Scatterplot of residuals against predicted values.
- Histogram of residuals.
- Normal probability plot of residuals.
- Box plot of residuals.
In linear regression, a high VIF value suggests:
- Heteroskedasticity is present
- A strong linear relationship between the independent and dependent variables.
- The absence of outliers in the dataset.
- Strong multicollinearity between predictor variables.
Mini-Batch Gradient code - (Post Read)
PostRead- Link: https://colab.research.google.com/drive/1EXhY2Ax1lPp7cc03bHvwtc-UQ78t6iKI?usp=sharing
Which statement is true about mini-batch gradient descent?
- It guarantees convergence to the global minima
- It may converge to a local minima due to the weight fluctuations
- It requires a very high learning rate.
- It is not suitable for large datasets.
What is the degree of a polynomial in polynomial regression?
- The number of features in the dataset.
- The number of training examples in the dataset.
- The highest power of the feature in the polynomial equation.
- The number of coefficients in the polynomial equation.
What metric should be used during Polynomial Regression ?
- R-sq
- Adj R-sq
- Doesnt matter
- Use a different metric
Why is Occam's Razor important in machine learning?
- It helps in selecting the model that fits the training data perfectly.
- It encourages the use of complex models.
- It helps in avoiding overfitting by favoring simpler models.
- It promotes the use of large datasets for training models.
Does the model LR = 5f1 + 0f1^2 + 0f1^2 + 0f1^4 underfit, overfit ?
- Underfit
- Overfit
If model has train perf = 90% and test perf = 91%, then:
- Model is overfitting
- Model is Underfitting
- Model is Perfectly fitting
- Cant say
What is the role of
- It increases overfitting
- It decreases underfitting
- It helps find the optimal trade-off
- It has no impact on overfitting
In which regularization technique two regularization constant lambda are used?
- L1 regularization
- L2 regularization
- Elastic Net regularization
- All of the above
What would a lambda/regularization rate value of 0 signify?
- Complex and overfitting
- Complex and underfitting
- Simple and overfitting
What would be the optimal value of lambda? Note: adj. R2 score is on test data
- Model M1, lambda=1, adj. R2 score=0.4
- Model M2, lambda=10, adj. R2 score=0.8
- Model M3, lambda=100, adj. R2 score=0.2
A company is building a predictive model for predicting customer churn. Which technique can help optimize the model's performance for selecting the best hyperparameters and evaluating its generalization ability?
- Sampling
- Feature selection.
- Ensemble learning.
- Hyperparameter tuning using Cross-validation.
How do we compute performance metric of a model computed in k-fold cross-validation?
- Taking the mean of metric obtained from each fold.
- Selecting the max value of metric obtained from each folds.
- Summing the performance metrics of each fold.
- Calculating the median of k different performance metrics.
What happens when the input to the sigmoid function is a very large negative value?
- The output becomes negative
- The output approaches 0
- The output approaches 1
- The output becomes undefined.
Which point will have a higher probability of belonging to class 1?
- x1
- x2
Supposedly your y = 0 and ŷ = 0.01 , so what be the log-loss ?
- log-loss will be a very high value
- log-loss will be a very low value
- log-loss will be 0
In logistic regression, the output of the sigmoid function is interpreted as:
- Class probabilities
- Raw scores
- Error rates
- Regression coefficients
What is the main risk of overfitting when tuning hyperparameters in logistic regression?
- The model may generalize well to unseen data but poorly on the training data
- The model may perform well on the training data but poorly on unseen data
- The model may underperform compared to a model with default hyperparameter values
- The model may be too simple and fail to capture complex relationships in the data
Which statement about the step function is true?
- It is continuous and differentiable
- It is continuous but not differentiable
- It is neither continuous nor differentiable
- It is differentiable but not continuous
What is the effect of increasing the regularization rate (C) in logistic regression?
- The model becomes less prone to overfitting
- The model's training accuracy increases
- The model becomes more prone to overfitting
- The model's test accuracy increases
The logistic regression model predicts:
- Probabilities
- Class labels
- Continuous values
- Ordinal values
How are log odds transformed into probabilities in logistic regression?
- By applying the sigmoid function
- By taking the exponential function
- By dividing by the odds ratio
- By subtracting the intercept term
How do outliers affect the classification boundaries in logistic regression?
- Outliers shift the classification boundaries closer to the outlier values
- Outliers have no effect on the classification boundaries
- Outliers widen the gap between the classification boundaries
- Outliers make the classification boundaries more sensitive to minor changes
What is the purpose of the one-vs-rest (OvR) strategy in multi-class logistic regression?
- To improve the interpretability of the model coefficients
- To handle imbalanced datasets in multi-class problems
- To reduce the complexity of the model
- To transform a multi-class problem into multiple binary classification problems
How is the loss function typically defined in multi-class logistic regression?
- Cross-entropy loss
- Mean squared error (MSE)
- Mean absolute error (MAE)
- Hinge loss
if data1: 20 Cancer Patients and 100 non-Cancer Patients and data2: 80 Cancer Patients 100 non-Cancer Patients, then:
- Data1 = Imbalance, Data2 = balance
- Data1 = balance, Data2 = Imbalance
- Data1 = balance, Data2 = balance
- Data1 = Imbalance, Data2 = Imbalance
In the evaluation of a classification model, what does the accuracy metric represent?
- The model's ability to handle imbalanced datasets
- The precision of the model in predicting positive instances
- The ratio of true positive predictions to the total predictions
- The overall correctness of the model's predictions across all classes
if model classifies students into classes A,B and C , then Confusion matrix looks like ?
- 12 x 12
- 2 x 2
- 4 x 4
- 3 X 3
For Ideal Model, which of the following is true?
- FP and FN ⇓, while TP and TN ⇑
- TP and TN ⇓, while FP and FN ⇑
- TP and FN ⇓, while FP and TN ⇑
- FP and TN ⇓, while TP and FN ⇑
Based on the Confusion matrix we saw, what is the total number of erroneous points?
- 31
- 88
- 106
- 1187
For movie recommendation, what would you prioritize more in this case?
- High recall
- High precision
For spam email filtering, what would you prioritize more in this case?
- recall
- precision
Why does the F-1 score use Harmonic Mean (HM) instead of Arithmetic Mean (AM) ?
- AM penalizes models the most when even Precision and Recall are low.
- HM penalizes models the most when even Precision and Recall are low.
- HM penalizes models the most when even Precision and Recall are high.
- AM penalizes models the most when even Precision and Recall are high.