PRML Course Project

Web Page to give the report

Problem Statement

Movie Recommendation System: A recommendation system's purpose is to search for content that would be interesting to an individual. Recommendation systems are AI-based algorithms that skim through all possible options and create a customized list of items that are interesting and relevant to an individual. These results are based on their profile, search/browsing history, what other people with similar traits/demographics are watching, and how likely you are to watch those movies.

Abstract

In this Project, we have implemented a Movie Recommendation System using different models which are KNN, Naive Bayes, Linear Regression, SVM and Random Forest. There are two types of recommendation systems, Collaborative and Item based. We have implemented Item Based Recommendation System using KNN and for Collaborative System we have implemented rest others. For this project, we have used data provided from kggle and used web scraping to collect posters and others details.

Thought Process

The project aims to develop a recommender system for this (MOVIE)
Broadly classifying there are two types of recommender systems :

Content based Filtering : In this the model tries to recommend the user based on his past queries, viewed movies, movies which he liked, browsing history and the suggest a movie which is similar to those features
Collaborative Filtering : In this we try to find similar unrelated users and and as they are have same interest what ever a user views it will be recommended to the other user
Hybrid : Content based Filtering + Collaborative Filtering
So it was decided that we would be using a hybrid model based on given task

Data Visualization and Info

The Data we have use has been taken from kaggel and it contains csv files through which we get the datas containing the informations: userID, movieID, tags, ratings, genre etc. and further we did web scraping in order to get the details such as Directors, Casts, Descriptions etc.

Two plots are shown below to visulaize the data:

Model Details

KNN Model

K-Nearest Neighbors (KNN) is a simple yet powerful supervised machine learning algorithm used for classification and regression tasks. The principle behind KNN is intuitive: it classifies or predicts a new data point's label or value based on its proximity to existing data points in the feature space. In classification, ...

Naive Bayes Model

Naive Bayes is a probabilistic classifier based on Bayes' theorem, with the "naive" assumption of feature independence. It's efficient, easy to implement, and works well with high-dimensional data. Despite its simplicity, Naive Bayes often performs surprisingly well in practice, making it popular for text classification ...

Linear Regression Model

Linear Regression models the relationship between a dependent variable and one or more independent variables by fitting a straight line to the data. It seeks to minimize the difference between observed and predicted values, estimating coefficients for the line to best represent the data's linear trend. This method is ...

Random Forest Model

Random Forest is an ensemble learning method that builds multiple decision trees during training and merges their predictions to improve accuracy and reduce overfitting. Each tree in the forest is built using a random subset of features and a random subset of the training data. Predictions are made by averaging or voting ...

SVM Model

Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks. It finds the optimal hyperplane in a high-dimensional space that best separates data points of different classes. SVM aims to maximize the margin between classes, making it robust to outliers ...

Bag of Words(using KNN Model)

This model basically works to recommend movies to a user via finding similar movies to the one searched by the user and sggesting them and is our only model which serves this purpose. We have used KNN for implementing this model. K-Nearest Neighbors (KNN) is a non-parametric supervised learning algorithm ...