Projects Overview#

Introduction#

Throughout my career, I have worked on a variety of projects spanning Data Engineering, Machine Learning, Data Migration, Data Governance, LLM-based applications, and ML-Ops. Each project has contributed to building and optimizing a robust data infrastructure, driving innovation, and delivering impactful results. Below is an overview of these projects and the value they provided.


Data Engineering (ETL)#

I developed a scalable ETL pipeline to process machine/battery-related issues. This involved:

  • Data ingestion: Real-time extraction of IoT event data from machines using Kafka.

  • Data transformation: Cleaning, filtering, and aggregating data with PySpark and SnapLogic.

  • Data storage: Implementing a data lake architecture using AWS S3 for optimized querying and analytics.

Impact:

  • Improved issue reporting efficiency by 30%.

  • Enabled faster root cause analysis through centralized and clean datasets.

Skills Used: PySpark, AWS Step Functions, SnapLogic, HiveSQL, Data Lake Architecture.


Large Language Models (LLM)#

Project: BCAT LLM Chatbot#

I designed and deployed a Generative AI chatbot to streamline information retrieval processes for BCAT. Key contributions include:

  • Built a query engine integrated with Snowflake for retrieving and updating data (e.g., student attendance, grades, and scholarships).

  • Designed interactive features for students, such as assignment to-do lists and deadline notifications.

Impact:

  • Simplified data access for teachers and students, reducing manual effort by 40%.

  • Enhanced data-driven decision-making through real-time insights.

Skills Used: Python, Snowflake, Generative AI, Natural Language Processing (NLP).


Machine Learning#

Project: Customer Spending Classification Model#

This project involved building a machine learning model to classify customer spending patterns for targeted marketing strategies. Key steps included:

  • Data preprocessing and feature engineering to improve model accuracy.

  • Developing and evaluating classification models using scikit-learn.

  • Deploying the model for real-time predictions in a production environment.

Impact:

  • Improved customer segmentation accuracy by 25%.

  • Enhanced ROI for marketing campaigns through better audience targeting.

Skills Used: Scikit-learn, Pandas, Python, ML Model Deployment.


Data Migration#

Project: Data Migration#

I executed a seamless data migration strategy for transitioning legacy systems to modern architectures. Key tasks included:

  • Migrating large datasets from on-premises to Snowflake using Python scripts and AWS S3.

  • Ensuring data integrity through automated validation scripts and detailed logging.

Impact:

  • Reduced system downtime during migration by 50%.

  • Enabled better scalability and performance with modern data infrastructure.

Skills Used: Python, SQL, Snowflake, AWS S3, Data Validation.


ML-Ops#

Project: Customer Spending Classification Model (ML-Ops)#

I implemented ML-Ops practices for the seamless deployment and maintenance of machine learning models. Key contributions:

  • Built CI/CD pipelines to automate testing, deployment, and monitoring.

  • Containerized applications using Docker and orchestrated them with Kubernetes for scalability.

Impact:

  • Reduced model deployment time by 70%.

  • Enhanced model reliability with real-time performance monitoring.

Skills Used: Docker, Kubernetes, CI/CD, Python.


Data Governance#

Project: Domain Usage Monitoring#

I established a data governance framework for monitoring domain usage, ensuring data security and efficient usage. Key activities:

  • Built dashboards to visualize data utilization and performance using Power BI.

  • Developed a PII detection ML model to safeguard sensitive information.

  • Implemented weekly reports to inform stakeholders about domain usage trends.

Impact:

  • Improved data security compliance by 20%.

  • Provided actionable insights to stakeholders for better resource allocation.

Skills Used: Data Governance, Snowflake, Qualys, Power BI, Python.

Power BI Dashboards#

Project: Interactive Business Dashboard#

I developed interactive dashboards in Power BI for visualizing and analyzing business performance metrics. Key achievements include:

  • Built a sales dashboard to track KPIs like revenue, growth rate, and profitability across regions.

  • Designed an operational dashboard to monitor real-time logistics data for optimized delivery routes.

  • Implemented role-based access controls to ensure secure data sharing.

Impact:

  • Enhanced executive decision-making with actionable insights.

  • Reduced data retrieval time for stakeholders by 60%.

Skills Used: Power BI, DAX, SQL, Data Visualization, Stakeholder Communication.


Conclusion#

These projects demonstrate my ability to work across the entire data lifecycle, from ingestion and processing to governance and deployment. My approach emphasizes scalability, efficiency, and innovation to solve complex data challenges and drive measurable outcomes.