Amazon: Applied Data Scientist - PhD Research Internship
Published:
I worked as a research intern at Amazon, Berlin. Improving retrieval in the retrieval augmented generation (RAG) pipeline with Large Language Models (LLMs) for code generation with Prabhu Teja and Giovanni Zappella. I designed and trained a lightweight retrieval model for code that incorporates: the semantics of code with dense embeddings; the file directory hierarchy; and the callgraph structure. This improved retrieval performance by over 50% over the baseline model and outperformed an LLM agent retrieval system with orders of magnitude fewer parameters.