Smit Deepak Shah

Boston, MA

I am a passionate data science enthusiast adept in analyzing intricate datasets, utilizing a wide array of programming languages and frameworks, and skilled in machine learning, deep learning, NLP, and data visualization. With a commitment to perpetual learning, I strive to leverage data to solve real-world challenges and collaborate with like-minded individuals to drive impactful change.

Experience

Data Science Intern

Dell Technologies

During my Dell internship, I enhanced the HVC lookalike model using BERTopic, elevating AUC by 2.5% for smaller businesses and achieving over 100% performance boost in Australia. I also spearheaded BERTopic integration, improving processing speed by 40% and scalability, while creating a time-saving chatbot that reduced manual research by 75%.

June 2023 - August 2023

Data Science Intern

Incipient Technologies Pvt. Ltd.

Created a Waste Area Prediction System using Yolov5s and data preprocessing with 97% accuracy, yielding a 5% revenue increase by efficiently identifying waste areas. Additionally, I designed a comprehensive web app with Yolov5l, Django, FastAI, Flutter, and Firebase, achieving 99% accuracy in machine unit tracking, reducing manual data entry time by 75%.

December 2021 - March 2022

Research Analyst Intern

K. J. Somaiya College of Engineering

Extensively investigated four word embeddings—BERT, word2Vec, GloVe, and Crawl—implemented on a dataset. Employing PyTorch, I built RNN and LSTM models to assess their performance on textual data. My research on the IMDB movie review dataset, encompassing 90,000 reviews, revealed that GloVe embeddings yielded the most favorable outcome, achieving an accuracy of 87.936% on our proposed model. Furthermore, my comprehensive study in Natural Language Processing culminated in a published research paper at IEEE.

June 2021 - September 2021

Education

Northeastern University

Master of Science
Data Science

GPA: 3.83

September 2022 - May 2024

University of Mumbai

Bachelor of Technology

GPA: 9.29

August 2018 - May 2022

Skills

Programming Languages & Tools
  • Programming Languages: Python, R, SQL, Bash, C++, Javascript, HTML, Dart
  • Tools & Technologies: Linux (UNIX), Git, Postman, Tableau, Hadoop, Spark, NoSQL, LLM
  • Machine Learning Libraries: TensorFlow, PyTorch, NLTK, NumPy, Pandas, OpenCV, sci-kit, Pandas, Langchain

Projects

TrendScope: Conducted sentiment analysis on 40k+ trending YouTube videos using Hugging Face, achieving 85% accuracy, leading to enhanced content quality, user engagement, and satisfaction. Additionally, I successfully categorized unlabelled videos with 95% accuracy, optimizing content discovery and platform usability.

HeadlineHunch: Designed a news headline generator model achieving a remarkable 78% accuracy, utilizing a diverse dataset of 4,515 examples from Kaggle. Through an advanced attention mechanism, the neural network assigned varying importance to input sequence words, resulting in contextually precise headlines. By gathering news data from diverse sources (2017) using Selenium and Scrapy, I trained the model for accurate headline generation. Additionally, I enhanced the model's performance to 80% accuracy by automating hyperparameter selection and data split.

Research Publication: Conducted a study on Dimensionality Reduction and Classification Algorithms for High Dimensional Datasets (May 2021 - Jan 2022), co-authoring and presenting the paper at ICIRCA 2021. Explored three datasets with up to 20,000 features, using PCA, LDA, and SVD to reduce dimensionality by 50%, followed by applying ML algorithms for accuracy analysis. Conclusively, the study revealed PCA + SVM as the optimal combination with 97.8% accuracy, followed by LDA + Random Forest, and SVD + SVM.

Contact