Cape Town or Johannesburg, South Africa
Elixirr Digital is a dynamic and innovative global consulting firm, recognized for delivering transformative solutions to our clients across a wide range of industries. As part of our growing AI practice, we are committed to developing enterprise-grade products, accelerators and agentic solutions that drive innovation for our clients and our people. We help leading organizations harness advanced AI technologies — from agentic workflows and retrieval-augmented generation (RAG) to multi-agent orchestration — grounded in their proprietary data and operating within the security and compliance standards their industries demand.
We are looking for a Senior QA Engineer to take ownership of quality for Elixirr’s AI-enabled solutions — from our internal agent platform to the growing portfolio of accelerators we build to deliver faster and grow the business.
Quality on AI systems is a different thing. The fundamental shift is from deterministic assertions to probabilistic scoring: you’re testing for meaning, grounding, accuracy and safety, not exact outputs. Behaviour drifts as models change, and a single regression in a prompt, tool or retrieval step can cascade across an entire agent workflow. We need a QA leader who embraces that reality, automates aggressively, and uses AI itself to accelerate how we surface, reproduce and triage issues.
This is an SDET-shaped role: you’ll shape our test strategy, build the automation and evaluation harnesses that keep our agents honest, and partner with engineering to bake quality into the way we build — shift-left into requirements and design, and shift-right into production observability.
Candidates applying for employment contract kindly note this position is a onsite working opportunity from our locations in Cape Town or Johannesburg.
What you will be doing as a Senior Back End Engineer at Elixirr Digital?
QA Strategy for AI-Enabled Solutions
Own the end-to-end QA strategy for Elixirr’s agent platform and AI-enabled accelerators — functional, non-functional, behavioural and safety.
Define what “good” looks like for agent behaviour: accuracy, grounding, tool use, escalation, latency, cost, safety and user experience.
Shift left: engage at requirements and design so gaps, edge cases and testability issues are caught before code is written.
Shift right: use production traces, evals and user feedback as part of the regression loop, so quality keeps improving after release.
Establish deployment gates so AI features ship with evidence, not hope.
Automation & Evaluation
Build and maintain automated test suites across UI, API, contract, integration and end-to-end layers, running as a first-class part of CI/CD.
Design LLM and agent evaluation harnesses using a layered approach: automated checks, LLM-as-judge scoring, rubric-based evaluation and targeted human review.
Maintain golden datasets, regression suites and red-teaming scenarios that evolve with the product.
Bring LLM quality metrics into the same unified reporting as functional, performance and security results.
Drive automation-first: if a check can be automated, it should be.
AI-Accelerated QA
Use AI tooling (code-generation, LLM-based test generation, synthetic data, AI-assisted exploratory testing, self-healing test frameworks) to accelerate coverage and reduce manual toil.
Practice prompt engineering for QA: constrain AI tools with precise specifications so they produce tests and triage artefacts worth keeping.
Build or integrate AI-assisted triage — clustering failures, summarizing root causes, drafting repro steps and proposing fixes to engineers.
Continuously evolve the QA tooling stack as the AI developer ecosystem matures.
Reliability, Performance & Safety
Run performance, load and chaos testing against agent workflows and backend services.
Stress-test guardrails, prompt injection defences, tool-use restrictions and data boundaries.
Cover accessibility, visual regression and security checks (SAST/DAST) as part of the standard test pipeline.
Partner with security and compliance on safety reviews for client-facing AI features.
Process & Collaboration
Work day-to-day with backend, front-end, AI and platform engineers — quality is a team sport and you’re the coach.
Feed defect patterns, evaluation results and production telemetry back into engineering and product priorities.
Coach engineers on writing better tests, better evals and more observable code — and keep humans in the loop where AI judgement isn’t enough.
Competencies and skillset we expect you to have to successfully perform your job:
Experience
5+ years in QA / SDET / Test Engineering, including hands-on ownership of automation frameworks.
Experience testing modern cloud-native applications (microservices, APIs, event-driven systems) in AWS and/or Azure.
Exposure to testing AI, ML or agent-based systems — or a clear, demonstrable plan for how you’d go about it.
Technical Expertise
Strong proficiency in at least one automation language/stack (e.g., Python with pytest, TypeScript with Playwright, Java with JUnit/RestAssured).
Comfort with API testing, contract testing and service virtualization.
Familiarity with CI/CD pipelines (GitHub Actions, GitLab CI, Azure DevOps) and running tests as a first-class part of the pipeline.
Solid understanding of observability tooling — logs, metrics, traces — and how to use it in testing and production monitoring.
AI & Evaluation Skills
Hands-on use of AI developer tools for test generation, code review and triage.
Understanding of LLM evaluation concepts: reference-based vs reference-free metrics, LLM-as-judge, rubric-based scoring, regression harnesses and continuous evaluation.
Awareness of AI-specific failure modes: hallucination, prompt injection, tool misuse, retrieval failures, context window issues and model drift.
Soft Skills
Relentlessly curious — you want to know why something failed, not just that it did.
Clear written communication; defect reports and evaluation summaries that engineers actually enjoy reading.
Self-driven in a distributed, multi-timezone team.
Preferred Qualifications
Experience with performance testing tools (k6, Locust, JMeter).
Familiarity with evaluation frameworks (e.g., OpenAI Evals, Ragas, DeepEval, Promptfoo, Langfuse).
Exposure to security testing, accessibility testing and compliance-driven environments.
Why is Elixirr Digital the right next step for you?
From working with cutting-edge technologies to solving complex challenges for global clients, we make sure your work matters. And while you’re building great things, we’re here to support you.
Compensation & Equity:
Health & Wellbeing:
Projects & Tools:
Learning & Growth:
We don’t just offer a job - we create space for you to grow, thrive, and be recognized.
Intrigued? Apply now!