Data Engineer

Rudra Prakash P.

Transforming raw data into reliable insights.

A hands‑on data engineering professional with experience building scalable pipelines, transforming multi‑source data and improving data quality using SQL, Python, Databricks and Spark.

Rudra Prakash P.

About Me

Building Resilient Data Pipelines

I am a data engineer based in London specialising in building resilient data pipelines and preparing AI‑ready datasets. My background combines a solid foundation in computer science with advanced analytical coursework from an MBA in Business Analytics. I translate business requirements into efficient technical solutions, ensuring data quality, governance and scalability. My mission is to enable organisations to make data‑driven decisions by delivering clean, trusted and timely data.

🗄️

Pipeline Development

End‑to‑end ELT/ETL workflows across cloud platforms and hybrid environments.

🛡️

Data Quality & Governance

Automated profiling, validation and reconciliation checks to reduce reporting risk.

📐

Data Modelling

Reusable SQL transformations and schema mappings for business‑critical datasets.

🤝

Collaboration

Translating business requirements into technical deliverables with cross‑functional teams.

Skills & Tools

Technical Competencies

Programming Languages

SQL Python PySpark

Databases & Warehousing

PostgreSQL Snowflake BigQuery Redshift Delta Lake Data Modelling

ETL/ELT & Orchestration

Databricks Apache Spark Apache Airflow dbt REST APIs

Cloud Platforms

AWS (Redshift, Glue, S3) Azure (Synapse, Data Factory) GCP (BigQuery) Microsoft Fabric

Big Data & Analytics

Spark Distributed Processing Lakehouse Architectures Data Quality Frameworks

DevOps & Automation

Git CI/CD Pipelines Terraform CloudFormation

BI & Reporting

Power BI Tableau SQL Optimisation

Experience

Professional Journey

💼

Data & Analytics Engineer

Mar 2025 – Present

ProudToBeMe, London

  • Design and build scalable data pipelines using Python, SQL, Databricks, and Spark across 3+ source systems, delivering 6+ analytics-ready datasets supporting reporting tied to £100k+ business activity.
  • Implement automated validation, profiling, and monitoring across 5+ ingestion and transformation workflows, improving data quality and reducing downstream reporting risk for datasets informing £100k+ operational decisions.
  • Develop reusable SQL-based transformation logic and structured data models for 6+ operational datasets, supporting consistent reporting and business analysis across £100k+ programme activity.
  • Prepare machine-readable datasets with consistent schema structures across multiple reporting workflows, improving interoperability and downstream consumption for 10+ users.
  • Work with 4+ cross-functional stakeholders/teams to translate business requirements into production-ready data processes supporting £100k+ business outcomes.
  • Support troubleshooting and continuous improvement across multiple live workflows, helping protect reporting reliability for £100k+ tracked operations.
  • Produce technical documentation for 5+ pipelines, transformations, and data assets, supporting knowledge transfer and ongoing service maintenance.
💼

Financial Business Analyst

May 2023 – Apr 2024

HighRadius Technologies

  • Analysed financial and operational datasets using SQL and structured data workflows to improve reporting accuracy, consistency, and decision support.
  • Improved data processing efficiency by 25% through refinement of reporting logic, better data handling practices, and workflow optimisation.
  • Translated business requirements into structured analytical outputs and reporting improvements, increasing financial visibility and reporting accuracy by 20%.
  • Investigated data issues across business processes, contributing to root cause analysis, defect resolution, and stronger reporting governance.
  • Worked with finance, operations, and technical teams to improve data consistency, reporting controls, and business process reliability.
💼

Quantitative Research Virtual Intern

Virtual Internship

JPMorgan Chase

  • Analysed financial datasets using Python to identify trends supporting quantitative research.
💼

Strategy Consulting Virtual Intern

Virtual Internship

Boston Consulting Group

  • Conducted market and pricing analyses using structured analytical frameworks.

Projects

Featured Work

A selection of data engineering projects showcasing end‑to‑end pipeline development, cloud architecture and quality-driven data solutions.

TfL Service Reliability Analytics Platform

Built a scalable cloud-based data workflow using REST APIs, Python, SQL, and Databricks to ingest operational data and prepare 10+ analytics-ready service reliability KPIs.

REST APIs Python SQL Databricks Spark Delta Lake

Key Contributions & Impact

  • Designed transformation pipelines for semi-structured API data across multiple source formats, improving standardisation, consistency, and reliability.
  • Implemented automated validation and data quality checks across downstream datasets, reducing reporting risk and increasing trust in published outputs.
  • Structured reproducible workflows using modern pipeline practices to improve maintainability, production readiness, and ease of enhancement.
  • Prepared clean, machine-readable datasets for analytics and service consumption, supporting future reporting, interoperability, and AI-ready use cases.
  • Documented data flows, transformations, and output structures to support clarity, reuse, and maintainability.

E‑commerce Data Lake & Warehouse

Designed and implemented an ETL framework to consolidate e‑commerce sales, customer and inventory data from multiple platforms into a unified data lake and warehouse.

Python SQL AWS Glue S3 Redshift Apache Airflow

Key Contributions & Impact

  • Built daily ingestion pipelines to extract data from multiple e‑commerce platforms and load into a centralised S3-based data lake.
  • Designed star‑schema data models in Redshift optimised for analytical queries and dashboard consumption.
  • Implemented data quality checks and reconciliation processes across ingestion and transformation stages.
  • Optimised SQL queries and partitioning strategies, improving query performance and reducing warehouse costs.
  • Improved reporting accuracy by 30% and enabled timely business insights for sales and inventory teams.

Real‑time IoT Sensor Data Pipeline

Architected a real‑time data pipeline to ingest and process IoT sensor data from industrial equipment, enabling proactive monitoring and predictive maintenance.

PySpark Streaming Apache Kafka Databricks Delta Lake

Key Contributions & Impact

  • Designed streaming ingestion pipelines using Kafka and PySpark Structured Streaming to process high-volume sensor telemetry in near real-time.
  • Implemented windowed aggregations and threshold-based alerting logic to detect anomalies and trigger maintenance workflows.
  • Persisted processed data to Delta Lake with time-partitioned tables for efficient historical analysis and trend detection.
  • Integrated pipeline outputs with operational dashboards for facility managers, improving visibility into equipment health.
  • Reduced incident response time and improved equipment reliability through proactive, data-driven maintenance scheduling.

Services

What I Offer

⚙️

Data Pipeline Development

Design, build and maintain robust pipelines that ingest and process data from diverse sources.

🔄

ETL/ELT Workflow Design

Create efficient workflows using Airflow and dbt to deliver analytics‑ready datasets on schedule.

🗄️

Data Warehouse Development

Architect cloud‑based warehouses or lakehouses, including schema design and optimisation.

☁️

Cloud Data Engineering

Deploy and manage data solutions on AWS, Azure or GCP leveraging native services.

🔌

Data Integration

Integrate APIs, databases and third‑party systems into unified datasets with data lineage.

Data Cleaning & Transformation

Apply transformation logic, deduplication and enrichment for high‑quality data.

SQL Optimisation

Optimise complex SQL queries and database objects to improve performance and reduce costs.

📦

Data Modelling

Design logical and physical data models (star, snowflake, Data Vault) for analytics and ML.

🔧

Workflow Automation

Automate repetitive data tasks with triggers, scheduling and monitoring.

📊

Dashboard Data Preparation

Prepare data for Power BI and Tableau, enabling actionable dashboards.

Credentials

Certifications & Education

🏆 Certifications

🔥

Databricks Certified Associate Data Engineer

Proficiency in Spark and Databricks for building scalable data pipelines.

☁️

Microsoft Fabric Data Engineer Associate (DP‑700)

Expertise in Azure data engineering services and Microsoft Fabric technologies.

💼

Product Consultant – HighRadius Technologies

Business acumen with technical knowledge of BI platforms and data solutions.

🎓 Education

MBA in Business Analytics

2024 – 2025

University of East London

Advanced coursework in data analysis, visualisation and quantitative decision‑making.

B.Sc. Computer Science & IT

2020 – 2024

Siksha O' Anusandhan University

Programming, database systems, SQL, Azure data engineering and cloud technologies.

Why Work With Me

Technical Depth Meets Business Insight

I combine technical depth with business insight to deliver trusted data solutions. My experience spans pipeline development, data modelling, quality assurance and cloud deployment. Employers and clients value my reliability, collaborative approach and commitment to clear documentation and governance. I focus on translating complex requirements into practical solutions that enhance decision‑making and drive measurable outcomes.