Python / Machine Learning Engineer
Turing Enterprises · Remote
- Designed adversarial and domain-specific terminal benchmarking tasks for LLMs (Claude Sonnet 4 / 4.5, Qwen Coder, Hunyuan) in Linux environments, targeting model failure modes.
- Authored golden Bash solutions and SFT training data to improve model performance on terminal operations and developer-tool workflows using Linux and Docker.
- Supported NVIDIA projects focused on improving model capability in external API calling and technical response generation through curated fine-tuning and evaluation workflows.
- Automated cross-application OS interaction tasks for the OS World project using Python and PyAutoGUI (Chrome, LibreOffice, Terminal, VS Code).
- Evaluated model outputs for Python, JavaScript, and SQL tasks in the Meta Evaluation Project — assessing truthfulness, instruction adherence, correctness, verbosity, and quality.
- Contributed to QA for Meta OpenClaw — training and evaluating API-calling models (Llama, Quite_Sand, Gemma) to complete user tasks through the Maton API (email, messaging, Jira, document drafting).