I have a passion for big data, real-time processing and distributed systems.
Experience
Senior Site Reliability Engineer
Jun 2024 – Present
Apple, Madrid
Worked within the Satellite Connectivity Group.
Senior Data Engineer
Apr 2022 – May 2024
Cloudflare, Lisbon
Migrated Apache Spark ingestion pipelines to Go improving peak throughputs from ~85k to 1.2M rows/s from Postgres, reading >4B rows/day.
Reduced daily Postgres ingestion memory consumption 94% and ingestion time from 15h to 1h20m using b-tree index histograms, relieving constrained CPU and memory in core datacenter.
Data Engineer
Jan 2021 – Mar 2022
Cloudflare, Lisbon
Developed data pipelines in Apache Spark with throughputs of >30B rows and >5TB/day.
Aggregated ClickHouse traffic and Prometheus metrics data from core and edge data centers.
Built data lineage sensing tools in Airflow, reducing data lineage incidents by 37%. Reduced MTTR and number of incidents by >50% QoQ, saving a 22-person team >800 man-hours per quarter.
Data Scientist
Oct 2018 – Dec 2020
Feedzai, Lisbon
Developed ML models detecting transaction fraud with 78% $ recall at 1% FPR in Apache Spark.
Built data pipelines on Apache Hadoop clusters to train models on >1B rows of data.
Supported production systems with throughputs of >2k events/s and <200ms latency at 99.999%.
Research Assistant
Jun 2017 – Sep 2018
Great Ormond Street Hospital for Children, London
Collaborated with engineers and clinicians building cardiac MRI segmentation pipelines.
Implemented a whole-heart segmentation workflow achieving an AUC of 0.752 using 3D CNNs.
Secured 2 successful grants from NVIDIA and Medtronic.