Hi, I'm Zainab Shakruwala

Data Scientist & Machine Learning Engineer

I am a Research Scientist at NYC DOC and a recent graduate from Columbia University, working in the domain of AI, Machine Learning and Data Science. My experience includes time series, recommendation systems and classical machine learning. In my free time I like to read, write and run.

Profile Photo

Skills

Programming Languages

Python R SQL JavaScript HTML/CSS

Machine Learning & Artificial Intelligence

RAG GraphRAG Neural Networks LSTM ARIMA TBATS XGBoost Scikit-learn PyTorch

Visualization and Analytics

Tableau Matplotlib Seaborn Plotly Dash Streamlit

Tools

Jupyter Git RStudio VS Code Pandas NumPy Polars

Cloud

AWS Snowflake GCP Docker DBeaver

Experience

Research Scientist

NYC DOC • Feb 2025 - Present

Led end-to-end machine learning projects, developed predictive models with less than 3.5% error, and developed dashboards. Improved business processes through data-driven insights.

  • Developed population forecasting models with 3.5% error
  • Created automated reporting dashboards to enhance decision-making efficiency for senior leadership

Apprentice Data Scientist

L'Oréal • Sep 2024- Dec 2024

Led the comparative study of Retrieval Augmented Generation (RAG) models with Graph RAGs

  • Built RAG using FAISS and OpenAI’s LLM to retrieve vector embeddings, leading to a 5% increase in answer accuracy
  • Enhanced the retrieval accuracy by 4.2% by leveraging Neo4j to build a GraphRAG
  • Established a scalable evaluation framework using the RAGAS for assessment of the RAG systems

Data Scientist

Prospero • Sep 2024 - Dec 2024

Analyzed large datasets to identify trends and patterns and built machine learning models for portfolio prediction.

  • Analyzed stock price data and made predictions using LSTM and ARIMA leading to 2% increase in accuracy
  • Optimized data collection pipelines to detect data anomalies.

Data Science Intern

Capital One • Jun 2024- Aug 2024

Performed exploratory data analysis, created visualizations, and improved existing Recommendation Systems.

  • Improved the Recommendation and Ranking Algorithm by 2.5% for Capital One's Auto Navigator Website
  • Generated KPI’s based on the model and user behavior by using SQL on Snowflake

DSI Scholar

Columbia Climate School• Jan 2024 - May 2024

Got selected as a DSI Scholar to assist the Climate School of Columbia as a Data Scientist

  • Used BERT and LDA for Topic Modeling and sentiment analysis on the Twitter data of migrants of South America
  • Built ArcGIS Dashboards to visualize plastic dump sites

Analyst

KPMG • Jan 2022 - Jul 2023

Analyzed large datasets to identify trends and patterns and built reporting dashboards.

  • Optimized resource utilization by analyzing data, identifying key patterns through SQL
  • Improved decision making by creating Tableau dashboards to visualize summary statistics.

ML Intern

Robotronix• Sep 2024 - Dec 2024

Worked as Machine Learning Intern at Robotics Start Up

  • Created a rainfall prediction system with 65% accuracy to give farmers more insight in their harvesting cycles
  • Developed a Proof-of-Concept for a Traffic Management System to detect violators using computer vision with 78% accuracy for the Indian city of Dheradun in collaboration with the city’s municipal corporation

Featured Projects

RAG Based Beauty Products Recommendation System

A Retrieval-Augmented Generation (RAG) chatbot for Sephora beauty products, built with FAISS vector search and Google Gemini.

Python LLM RAG GraphRAG

Broadway Shows Analysis

An interactive Streamlit app for analyzing historical Broadway show performance. It visualizes monthly gross revenue trends, ranks top shows by different metrics, and includes an ARIMA-based monthly forecast for the show “Wicked.”

Pandas ARIMA Streamlit Plotly

NYC Subway Ridership Forecasting

Built a machine learning model to predict NYC subway's Daily rideship with an accuracy of 86%

Python ARIMA Pandas TBATS

Analyzing Crime in LA

Developed an interactive dashboard for crime statistics in LA making complex statistics more accessible by employing visualizations like time series, mosaic plots, Cleveland plots, alluvial plots, and spatial plots

R RStudio Javascript Plotly

Optimizing the Recommendation & Ranking Algorithm @ Capital One

Improved the existing Recommendation System by 2.5% ,generating leads for more than 3000+ cars monthly by user behavior analytics and Gradient Boosting, enhancing targeted recommendations and user engagement

Python XGBoost Snowflake Docker

Rainfall Prediction

Designed and implemented an automated ETL pipeline processing 1M+ records daily with real-time monitoring and error handling.

Python XGBOOST Logistic Regression Arduino

Get In Touch

Let's Connect!

I'm always interested in discussing data science opportunities, collaborating on interesting projects, or just having a chat about ML and AI.

zainab.shakruwala@columbia.edu
New York, USA