Senior AI Engineer
Job Description
Noveum.ai is the tracing, evaluation, and observability platform for AI agents — trusted by thousands of developers to monitor, debug, and optimize their LLM, RAG, and multi-agent systems. We're hiring a Senior AI Engineer (2-3+ years) who has built AI agents extensively, deployed AI models in production, and obsessively analyzed and evaluated how they behave. You'll work on cutting-edge AI research that is among the best in the world right now — agent evaluation, LLM-as-judge, and autonomous optimization — and turn it into products developers depend on every day. This is a senior role with a deliberately high bar: we hire only experienced, exceptional senior AI engineers and we reject most applicants. If you do not meet the requirements below, please do not apply.
Key Responsibilities
- Architect, build, and ship agentic AI features end to end
- Design evaluation pipelines and LLM-as-judge systems that define what 'correct' means for agents
- Drive cutting-edge AI research on agent reliability, evaluation, and autonomous optimization
- Deploy, monitor, and tune models in production for reliability, latency, and cost
- Mine production traces to diagnose failures and systematically improve agent behavior
- Set technical direction and raise the bar for AI engineering across the team
Requirements
- 2-3+ years building and shipping production AI/ML systems
- Built AI agents extensively — LLM tool-use, RAG, and multi-agent
- Deployed and operated AI models in production at real scale
- Rigorous evaluation & analysis — evals, LLM-as-judge, observability
- Expert-level Python (TypeScript a plus)
- Deep generative AI API & prompt-engineering experience (OpenAI, Anthropic, etc.)
Preferred Qualifications
- Published research, open-source AI tools, or other public work in AI/ML
- Experience building evals, observability, or tracing infrastructure
- Worked with agent frameworks and orchestration at scale
- Background in NLP, RL, or LLM fine-tuning / post-training
- Took AI systems from prototype to reliable production yourself