Experience

Amazon: Applied Data Scientist - PhD Research Internship

Published:

I worked as a research intern at Amazon, Berlin. Improving retrieval in the retrieval augmented generation (RAG) pipeline with Large Language Models (LLMs) for code generation with Prabhu Teja and Giovanni Zappella. I designed and trained a lightweight retrieval model for code that incorporates: the semantics of code with dense embeddings; the file directory hierarchy; and the callgraph structure. This improved retrieval performance by over 50% over the baseline model and outperformed an LLM agent retrieval system with orders of magnitude fewer parameters.

DataProphet: Machine Learning Engineer - Internship

Published:

I worked during my summer vacation 2018 as a machine learning engineer intern at a DataProphet. I learned to navigate Linux over Windows, use VIM, and work with git in a team. I wrote shell scripts for automation, documented for reusability, and adopted Python over R. Standardized coding improved readability, and I grasped the importance of virtual environments. I also deployed machine learning models on Google Cloud.

Eighty20: Data Scientist - Internship

Published:

I worked during my summer vacation 2018 as a data science intern at a Eighty20. I wrote SQL scripts for large databases on the Eighty20 data portal, refining my data wrangling, manipulation, and analysis skills. At the same time, I worked in R, building a user-friendly Shiny interface for data clustering. The experience pushed my skills through real-world challenges.

Eighty20: Data Scientist - Internship

Published:

I worked during my winter vacation 2017 as a data science intern at a Eighty20. I collaborated on a public interest project to visualize South Africa’s water crisis. Using consumption data and dam levels, we built an interactive dashboard that informs users about the current situation.