# Machine Learning I

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.

**Basic Concepts:**

**1- Variables**

It is divided into dependent and independent variables. Dependent variables can be named as a target, dependent, output, response in many pieces of literature. The dependent variable is the variable we are interested in in the data set. Often referred to as a target. It is called output in artificial neural networks. It is called dependent in statistical studies. The arguments are often called features. In much literature, it can also be called input, column, predictor, explanator. It can be expressed as the variables that make up the dependent variable in the data set.

## 2- Learning Types

Learning is basically divided into three as supervised, unsupervised, and reinforced (reinforcement).

**Supervised Learning** describes a class of problems that involves using a model to learn a mapping between input examples and the target variable. As input data is fed into the model, it adjusts its weights until the model has been fitted appropriately. This occurs as part of the cross-validation process to ensure that the model avoids overfitting or underfitting. Supervised learning helps organizations solve a variety of real-world problems at scale, such as classifying spam in a separate folder from your inbox. Some methods used in supervised learning include neural networks, naïve Bayes, linear regression, logistic regression, random forest, support vector machine (SVM), and more.

**Unsupervised Learning** describes a class of problems that involves using a model to describe or extract relationships in data. These algorithms discover hidden patterns or data groupings without the need for human intervention. Its ability to discover similarities and differences in information makes it the ideal solution for exploratory data analysis, cross-selling strategies, customer segmentation, image, and pattern recognition. It’s also used to reduce the number of features in a model through the process of dimensionality reduction; principal component analysis (PCA) and singular value decomposition (SVD) are two common approaches for this. Other algorithms used in unsupervised learning include neural networks, k-means clustering, probabilistic clustering methods, and more.

**Reinforcement Learning **describes a class of problems where an agent operates in an environment and must learn to operate using feedback. The use of an environment means that there is no fixed training dataset, rather a goal or set of goals that an agent is required to achieve, actions they may perform, and feedback about performance toward the goal. t is similar to supervised learning in that the model has some response from which to learn, although the feedback may be delayed and statistically noisy, making it challenging for the agent or model to connect cause and effect. An example of a reinforcement problem is playing a game where the agent has the goal of getting a high score and can make moves in the game and received feedback in terms of punishments or rewards.

## 3- Problems

**Regression** consists of dependent variable numerical values. It is used to measure the relationship between two or more quantitative variables.

**Classification** consists of dependent variable categorical values. For example, the survival situation, such as male-female. And if it is desired to make a class prediction from these values, it is a classification problem.

Classification predictive modeling problems are different from regression predictive modeling problems.

- Classification is the task of predicting a discrete class label.
- Regression is the task of predicting a continuous quantity.

There is some overlap between the algorithms for classification and regression; for example:

- A classification algorithm may predict a continuous value, but the continuous value is in the form of a probability for a class label.
- A regression algorithm may predict a discrete value, but the discrete value in the form of an integer quantity.

## 4- Model Training

We create a model to learn the structure of the dataset.

## 5- Model **Evaluation**

Is our prediction successful? Performance metrics vary according to the type of problem.

The success metrics for the **regression model** are mean squared error (**MSE**), root square mean square error (**RMSE**), and absolute errors (**MAE**).

Success metrics in **classification methods**, confusion matrix, ROC curve, and Log Loss.

**Confusion matrix** helps us to calculate some values between actual values and predicted values.

- True Positive (TP) means true to truth.
- True Negative(TN), we can call false to false.
- False Positive (FP), we can say true to false.
- False Negative(FN), we can say false means true. (To say that the sick person is not sick)
- Accuracy is the ratio of correct guesses to the total number of guesses
- Precision is the success rate of positive grade predictions.
- Recall is the rate at which the positive class was predicted correctly.
- F1 Score allows us to get a more accurate result by weighting the harmonic mean with precision and recall. It gives a better result in unbalanced data problems.

**ROC Curve**, we use the AUC-ROC curve to evaluate the performance of classification problems. It uses False Positive Rate and True Positive Rate values. The ROC curve visualizes according to different classification thresholds. The closer you get to the upper left corner, the better the overall accuracy of the test.

**AUC** (Area Under the Curve) is the area under the ROC curve. The closer to 1, the higher the success of the model.

## 6- Model Validation Methods (Validation)

Methods for evaluating a model’s performance are divided into 2 categories: namely holdout and Cross-validation. Both methods use a test set (i.e. data not seen by the model) to evaluate model performance. It’s not recommended to use the data we used to build the model to evaluate it. This is because our model will simply remember the whole training set and will therefore always predict the correct label for any point in the training set. This is known as overfitting.

**Holdout**

The purpose of holdout evaluation is to test a model on different data than it was trained on. This provides an unbiased estimate of learning performance.

In this method, the dataset is *randomly* divided into three subsets:

**Training set**is a subset of the dataset used to build predictive models.**Validation set**is a subset of the dataset used to assess the performance of the model built in the training phase. It provides a test platform for fine-tuning a model’s parameters and selecting the best-performing model. Not all modeling algorithms need a validation set.**Test set**, or unseen data, is a subset of the dataset used to assess the likely future performance of a model. If a model fits the training set much better than it fits the test set, overfitting is probably the cause.

The holdout approach is useful because of its speed, simplicity, and flexibility. However, this technique is often associated with high variability since differences in the training and test dataset can result in meaningful differences in the estimate of accuracy.

**Cross-Validation**

Cross-validation is a technique that involves partitioning the original observation dataset into a training set used to train the model and an independent set used to evaluate the analysis.

The most common cross-validation technique is k-fold cross-validation, where the original dataset is partitioned into k equal size subsamples, called folds. The k is a user-specified number, usually with 5 or 10 as its preferred value. This is repeated k times, such that each time, one of the k subsets is used as the test set/validation set, and the other k-1 subsets are put together to form a training set. The error estimation is averaged over all k trials to get the total effectiveness of our model.

For instance, when performing five-fold cross-validation, the data is first partitioned into 5 parts of (approximately) equal size. A sequence of models is trained. The first model is trained using the first fold as the test set, and the remaining folds are used as the training set. This is repeated for each of these 5 splits of the data, and the estimation of accuracy is averaged over all 5 trials to get the total effectiveness of our model.

As can be seen, every data point gets to be in a test set exactly once and gets to be in a training set k-1 times. This significantly reduces bias, as we’re using most of the data for fitting, and it also significantly reduces variance, as most of the data is also being used in the test set. Interchanging the training and test sets also adds to the effectiveness of this method.

## 7- **Bias-Variance Tradeoff**

Underfitting refers to high bias. It is the case of not learning the data set of the model at all. Correct Model, low bias, low variance. It is the model that can capture the pattern of the data; it is the case that it can represent the data we have. Overfitting, high variance. It is the one-to-one learning and copying of the data by the model.

References: