Skip to main content

6 posts tagged with "ml"

View All Tags

ยท 2 min read
Aneesh Sambu

So? What's the story of this Neural thing?โ€‹

  • These are artificial mathematical models which are inspired from our biological neural networks

  • so basically it has three layers

    1. Input Layer
    2. Output Layer
    3. Hidden Layers
  • all between cells are called as neurons

  • hidden layers do the most of the computations

  • basically all these layers are connected with which we call as channels

  • each channel has its own numerical weight

  • the inputs are multiplied with the corresponding weight of the channel and then each neuron in hidden layer has a value which is known as a bias

  • this bias is added to the multiplied value and is passed through a threshold function known as the activation function

  • the result coming from the activation function determines whether the neuron is activated or not

  • and then activated neurons participate in the further channels and they are propagated untill a final prediction is made

  • this propagation is known as the forward propagation

  • now at first we may not get the correct prediction as shown in the above picture, so now the error magnitude is calculated and based on that flow in reverse direction happen which is known as backward propagation

  • as this backward propagation takes place, the weights adjust themselves in such a way that they can predict correctly for a given data

  • in this manner huge labeled data is trained so that neural network can get proper weights and it can predict properly

  • it is a very time consuming and high computational process


What are the real time examples?โ€‹

  • Facial Recognition
  • Forecasting (Weather forecast etc)
  • Music Composition

Note

  • Anything in real life, which follows a pattern kind of resemblance, a neural network can be applied and trained
  • Neural Networks come under deep learning which is a subset of machine learning

ยท 3 min read
Aneesh Sambu

So what's this Regression?โ€‹

In machine learning, a regression task is a type of supervised learning problem where the goal is to predict a continuous numerical value, such as a price, a temperature, or a stock price. The objective of a regression model is to learn a function that maps input features to a continuous output value.

Regression tasks are different from classification tasks, where the goal is to predict a categorical label, such as whether an email is spam or not. In a regression task, the output variable is a continuous value, whereas in a classification task, the output variable is a discrete value.

Regression models can be used for a wide range of applications, such as predicting housing prices based on features like location, square footage, and number of bedrooms, or predicting the temperature based on weather conditions like humidity, wind speed, and cloud cover.

There are many different types of regression models, including linear regression, polynomial regression, and decision tree regression. The choice of model depends on the specific problem and the characteristics of the data. The performance of a regression model is typically evaluated using metrics like mean squared error (MSE) or root mean squared error (RMSE), which measure the difference between the predicted values and the actual values.


What is the main difference between classification and regression?โ€‹

The main difference between regression and classification tasks in machine learning is the type of output that the model is trying to predict.

In a regression task, the goal is to predict a continuous numerical value, such as a price, a temperature, or a stock price. The objective of a regression model is to learn a function that maps input features to a continuous output value.

In a classification task, on the other hand, the goal is to predict a categorical label, such as whether an email is spam or not, or whether a patient has a certain disease or not. The output variable is a discrete value, and the model is trained to classify input data into one of several predefined categories.

Another key difference between regression and classification tasks is the type of algorithms that are used. Regression models typically use algorithms like linear regression, polynomial regression, or decision tree regression, while classification models use algorithms like logistic regression, decision trees, or support vector machines.

The evaluation metrics used for regression and classification tasks are also different. For regression tasks, metrics like mean squared error (MSE) or root mean squared error (RMSE) are commonly used to measure the difference between the predicted values and the actual values. For classification tasks, metrics like accuracy, precision, recall, and F1 score are used to measure the performance of the model in correctly classifying the input data.

ยท 5 min read
Aneesh Sambu

Plain Definitionโ€‹

KNN (K-Nearest Neighbors) is a machine learning algorithm used for classification and regression tasks. It is a non-parametric algorithm, which means that it does not make any assumptions about the underlying distribution of the data.

In the KNN algorithm, the input data is represented as points in a high-dimensional space, and the algorithm classifies new data points based on their proximity to the existing data points. Specifically, the algorithm calculates the distance between the new data point and each of the existing data points, and then assigns the new data point to the class that is most common among its K nearest neighbors.

The value of K is a hyperparameter that can be tuned to optimize the performance of the algorithm. A larger value of K will result in a smoother decision boundary, but may also lead to misclassification of data points that are close to the boundary between two classes.

KNN is a simple and effective algorithm that can be used for a wide range of classification and regression tasks. However, it can be computationally expensive for large datasets, and may not perform well in high-dimensional spaces.


Real time exampleโ€‹

Imagine you are a penguin living in Antarctica, and you want to find a new place to build your igloo. You have heard that some areas are better than others, but you're not sure which ones. So, you decide to ask your penguin friends for advice.

You ask your friends to rate different areas on a scale of 1 to 10, based on how good they are for building an igloo. You also ask them to tell you the distance of each area from your current location.

Now, you have a dataset of ratings and distances for different areas. You want to use this data to find the best place to build your igloo.

This is where KNN comes in. It can help you find the best place to build your igloo based on the ratings and distances provided by your friends.

Here's how it works:

  1. You choose a value for K. This is the number of neighbors you want to consider when making a decision. Let's say you choose K=3.
  2. You calculate the distance between each area and your current location.
  3. You find the 3 areas that are closest to your current location.
  4. You look at the ratings for these 3 areas, and take the average. This gives you a predicted rating for each of the 3 areas.
  5. You choose the area with the highest predicted rating as the best place to build your igloo.

So, in this example, KNN helped you find the best place to build your igloo based on the ratings and distances provided by your friends.

Of course, in real life, KNN can be used for many other things besides finding the best place to build an igloo. For example, it can be used to predict the price of a house based on its features, or to classify images based on their content.

But hopefully this fun example helps you understand the basic idea behind KNN!


Now what is this weighted KNN?โ€‹

Weighted KNN is a variation of the KNN algorithm where the contribution of each of the K nearest neighbors is weighted according to their distance from the query point. In other words, the closer a neighbor is to the query point, the more weight it is given in the final prediction.

In the standard KNN algorithm, all K neighbors are given equal weight in the final prediction. However, this may not always be the best approach, as some neighbors may be more relevant than others depending on their distance from the query point.

For example, let's say you are trying to predict the price of a house based on its features, such as the number of bedrooms, bathrooms, and square footage. In a standard KNN algorithm, the K nearest neighbors are chosen based solely on their feature values, without considering their distance from the query point. However, it's possible that some of these neighbors are located far away from the query point, and therefore may not be as relevant to the prediction.

In a weighted KNN algorithm, the contribution of each neighbor is weighted based on its distance from the query point. This means that neighbors that are closer to the query point are given more weight in the final prediction, while neighbors that are farther away are given less weight.

Using the same example of predicting house prices, this means that the K nearest neighbors are chosen based on both their feature values and their distance from the query point. The closer a neighbor is to the query point, the more weight it is given in the final prediction, as it is considered to be more relevant to the prediction.

Overall, weighted KNN can be a useful variation of the KNN algorithm when the distance between neighbors is an important factor in the prediction.

ยท 2 min read
Aneesh Sambu

In machine learning, a parametric algorithm is a type of algorithm that makes assumptions about the underlying distribution of the data. These assumptions are typically based on a specific mathematical model, such as a linear regression model or a Gaussian distribution.

Parametric algorithms estimate the parameters of the model based on the training data, and then use these parameters to make predictions or decisions on new data. Because they make assumptions about the underlying distribution, parametric algorithms can be more efficient and require less data than non-parametric algorithms.

Examples of parametric algorithms include Linear Regression, Logistic Regression, Naive Bayes, and Linear Discriminant Analysis (LDA). These algorithms are often used in classification and regression tasks, and can be effective in a wide range of applications. However, they may not be as flexible as non-parametric algorithms and may not perform well if the underlying assumptions are not met.

In parametric algorithms, we are assuming that the data follows a specific mathematical model or distribution. For example, in linear regression, we assume that the relationship between the input variables and the output variable is linear. In logistic regression, we assume that the output variable follows a logistic distribution. In Naive Bayes, we assume that the input variables are conditionally independent given the output variable. In Linear Discriminant Analysis (LDA), we assume that the input variables follow a multivariate normal distribution. These assumptions allow us to estimate the parameters of the model based on the training data, and then use these parameters to make predictions or decisions on new data. However, if the underlying assumptions are not met, the performance of the algorithm may be affected.

ยท One min read
Aneesh Sambu

In machine learning, a non-parametric algorithm is a type of algorithm that does not make any assumptions about the underlying distribution of the data. This is in contrast to parametric algorithms, which assume that the data follows a specific distribution, such as a normal distribution.

Non-parametric algorithms are often used when the underlying distribution of the data is unknown or cannot be easily modeled. Instead of making assumptions about the distribution, non-parametric algorithms rely on the data itself to make predictions or decisions. This makes them more flexible and adaptable to a wide range of data types and distributions.

Examples of non-parametric algorithms include K-Nearest Neighbors (KNN), Decision Trees, Random Forests, and Support Vector Machines (SVMs). These algorithms are often used in classification and regression tasks, and can be effective in a wide range of applications. However, they can also be computationally expensive and may require more data to achieve good performance compared to parametric algorithms.

ยท 2 min read
Aneesh Sambu
  • It is a type of Greedy Algorithm
  • In this we try to prepare a model by taking a set of features and try to prepare a binary tree where at the end of leaf notes we get a part of one feature only
  • So basically we are classifying the given features with their conditions and dividing them by borders and seperating them
  • example let the set of data be this
    • let our sample feature set be
    • where red is one type of feature and green is other
    • where x axis denotes X0X_0 and y axis denotes X1X_1
  • the nodes other than Leaf Nodes are called Decision Nodes, we can find the required optimal decision for each node using information theory
    • this is our decision tree, and leaf nodes consists of the classification
  • even if we want to add a new entity, we follow the decision nodes starting from top and like binary search tree if go through the tree and place it in its respective position
  • this is known as a greedy algorithm as we are finding the best case for our immediate sub task only, not considering future states prehandedly
  • using Information Theory, entropy and all we get optimal decision conditions (cause there can be many we should choose optimal condition) (it comes under Information Gain)
  • note: we use object oriented programming for implementing ml algorithms for ease of use and to use it more effectively
  • we use other quantifier for decision nodes along with Entropy which is known as Gini Index