Muhammad Naufil

AI Engineer  Β·  MS CS @ Saarland University

I build AI systems at the intersection of computer vision, LLMs, and agentic workflows. Currently pursuing an MS in Computer Science at Saarland University and actively looking for part-time opportunities.

Previously a Computer Vision Working Student at CISPA on deepfake detection β€” finetuning vision foundation models at scale on clusters of 8Γ—A100 GPUs. Also a HiWi at MPI Informatics on co-speech gesture synthesis using latent diffusion models, a Computer Vision Engineer at Retrocausal (Redmond, WA) for 3+ years building markerless MoCap and action segmentation systems deployed at Ford, Honda, and Nissan, and a Research Assistant at Zema exploring RAG and AI agent frameworks.

On the side I build agentic AI products β€” including an end-to-end blogger outreach tool powered by Openclaw, and an LLM companion with vision and persistent memory.

Work Experience

CISPA
Computer Vision Working Student β€” Deepfake Detection
Detesia  |  CISPA Startup  Β·  Oct 2024 – Dec 2025

Finetuned vision foundation models on a large-scale collection of deepfakes using clusters of 8Γ—A100 GPUs. Conducted large-scale generation of synthetic deepfake datasets to expand training coverage and improve detection robustness.

Zema
Research Assistant β€” LLMs & AI Agents
Zema  |  SaarbrΓΌcken  Β·  Aug 2024 – Jun 2025

Evaluated instruction-following and few-shot learning of the latest open- and closed-source LLMs. Built a personal AI companion using GPT-4o with function calling and RAG retrieval (Pinecone). Explored agentic frameworks including n8n and crewAI to build AI workflows and automation pipelines.

MPI Informatics
MPI Informatics  |  SaarbrΓΌcken  Β·  Jun – Sep 2024

Built up expertise in diffusion models progressively: started with unconditional image generation, then conditioned models on class labels to understand classifier-free guidance. Graduated to latent diffusion models for monadic co-speech gesture synthesis β€” first unconditional, then conditioned on speech transcription to generate gestures that naturally accompany spoken language.

Manual Assembly Copilot
Retrocausal, Inc.  |  Redmond, WA  Β·  Nov 2020 – Mar 2024

Empower your operators, engineers, and managers to dramatically boost the quality and productivity of your manual processes. Create digital mistake-proofing mechanisms for a variety of assembly and packing processes. Pathfinder tracks individual steps of an assembly process, and offers audible and visual alerts to help associates avoid mistakes.

Guidance Analytics and Trace
Retrocausal

Collects timing data for every cycle performed on a line, at a fine-grained level. It identifies non-value add work as well as most efficiently performed cycles. These capabilities directly aid industrial engineers in improving processes. Additionally, Pathfinder grades individual assembly sessions, versus ideal number of cycles that could have been performed and considering operator mistakes. This further helps engineers compare and contrast different sessions and work styles to rapidly improve processes.

Monocular Camera MoCap System
Retrocausal

Computer vision based "in-process" health and safety analytics. Analyzes videos recorded from ordinary phone cameras via a mobile app or web portal upload. Uses state-of-the-art computer vision technology to compute 3D skeletal poses and extract 3D joint angles, optimized for industrial use cases with robustness to partial obfuscation and extreme postures. Has significant advantages over wearables, goniometers, or Marker-Based Motion Capture Systems.

Personal Projects

πŸ‘οΈ

LLM-powered companion with a pair of eyes to perceive the world around you. Uses RAG retrieval (Pinecone) to recall past conversations like a true friend, and function calling to take actions. Token-efficient by design β€” only processes camera images when it deems necessary, keeping costs low without sacrificing context.

RAG Function Calling Vision
πŸš—

Predicts steering angle, detects objects, and classifies traffic lights at 4 FPS on a Jetson Nano. Bachelor thesis at NED University, 2020.

Computer Vision Edge AI
🚒

HuggingFace competition β€” pushed mAP to 0.59 with YOLOv5/v7/v8. Key insight: training on evenly-tiled image crops enables zoomed-in detection.

Object Detection YOLO
🐝

Extracted bee centroids from heatmap images to synthesize YOLO-format bounding boxes, then trained YOLOv5 for detection.

Object Detection YOLO
πŸ€–

Built transformer decoder blocks one-by-one to deeply understand the architecture. Trained two models β€” one on Shakespeare, one on a subset of OpenWebText.

LLMs PyTorch
πŸ’¬

Fine-tuned GPT-2 on public Reddit conversations to capture sentiment on unfolding events in Pakistan. Hosted on HuggingFace Spaces.

NLP Fine-tuning

Publications

🧠
Muhammad Naufil
arXiv 2026 Multilingual LLMs Mechanistic Interpretability