I am a passionate data science enthusiast adept in analyzing intricate datasets, utilizing a wide array of programming languages and frameworks, and skilled in machine learning, deep learning, NLP, and data visualization. With a commitment to perpetual learning, I strive to leverage data to solve real-world challenges and collaborate with like-minded individuals to drive impactful change.
During my Dell internship, I enhanced the HVC lookalike model using BERTopic, elevating AUC by 2.5% for smaller businesses and achieving over 100% performance boost in Australia. I also spearheaded BERTopic integration, improving processing speed by 40% and scalability, while creating a time-saving chatbot that reduced manual research by 75%.
Created a Waste Area Prediction System using Yolov5s and data preprocessing with 97% accuracy, yielding a 5% revenue increase by efficiently identifying waste areas. Additionally, I designed a comprehensive web app with Yolov5l, Django, FastAI, Flutter, and Firebase, achieving 99% accuracy in machine unit tracking, reducing manual data entry time by 75%.
Extensively investigated four word embeddings—BERT, word2Vec, GloVe, and Crawl—implemented on a dataset. Employing PyTorch, I built RNN and LSTM models to assess their performance on textual data. My research on the IMDB movie review dataset, encompassing 90,000 reviews, revealed that GloVe embeddings yielded the most favorable outcome, achieving an accuracy of 87.936% on our proposed model. Furthermore, my comprehensive study in Natural Language Processing culminated in a published research paper at IEEE.
GPA: 3.83
GPA: 9.29
TrendScope: Conducted sentiment analysis on 40k+ trending YouTube videos using Hugging Face, achieving 85% accuracy, leading to enhanced content quality, user engagement, and satisfaction. Additionally, I successfully categorized unlabelled videos with 95% accuracy, optimizing content discovery and platform usability.
HeadlineHunch: Designed a news headline generator model achieving a remarkable 78% accuracy, utilizing a diverse dataset of 4,515 examples from Kaggle. Through an advanced attention mechanism, the neural network assigned varying importance to input sequence words, resulting in contextually precise headlines. By gathering news data from diverse sources (2017) using Selenium and Scrapy, I trained the model for accurate headline generation. Additionally, I enhanced the model's performance to 80% accuracy by automating hyperparameter selection and data split.
Research Publication: Conducted a study on Dimensionality Reduction and Classification Algorithms for High Dimensional Datasets (May 2021 - Jan 2022), co-authoring and presenting the paper at ICIRCA 2021. Explored three datasets with up to 20,000 features, using PCA, LDA, and SVD to reduce dimensionality by 50%, followed by applying ML algorithms for accuracy analysis. Conclusively, the study revealed PCA + SVM as the optimal combination with 97.8% accuracy, followed by LDA + Random Forest, and SVD + SVM.