Senior QA Engineer (AI Solutions)

Rijeka or Zagreb, Croatia (Hybrid)

Elixirr Digital is a dynamic and innovative global consulting firm, recognized for delivering transformative solutions to our clients across a wide range of industries. As part of our growing AI practice, we are committed to developing enterprise-grade products, accelerators and agentic solutions that drive innovation for our clients and our people. We help leading organizations harness advanced AI technologies — from agentic workflows and retrieval-augmented generation (RAG) to multi-agent orchestration — grounded in their proprietary data and operating within the security and compliance standards their industries demand. 

We are looking for a Senior QA Engineer to take ownership of quality for Elixirr’s AI-enabled solutions — from our internal agent platform to the growing portfolio of accelerators we build to deliver faster and grow the business. 

Quality on AI systems is a different thing. The fundamental shift is from deterministic assertions to probabilistic scoring: you’re testing for meaning, grounding, accuracy and safety, not exact outputs. Behaviour drifts as models change, and a single regression in a prompt, tool or retrieval step can cascade across an entire agent workflow. We need a QA leader who embraces that reality, automates aggressively, and uses AI itself to accelerate how we surface, reproduce and triage issues. 

This is an SDET-shaped role: you’ll shape our test strategy, build the automation and evaluation harnesses that keep our agents honest, and partner with engineering to bake quality into the way we build — shift-left into requirements and design, and shift-right into production observability. 

Candidates applying for employment contract kindly note this position is a hybrid working opportunity from our locations in Rijeka or Zagreb.

What you will be doing as a Senior Back End Engineer at Elixirr Digital?  

QA Strategy for AI-Enabled Solutions 

  • Own the end-to-end QA strategy for Elixirr’s agent platform and AI-enabled accelerators — functional, non-functional, behavioural and safety. 

  • Define what “good” looks like for agent behaviour: accuracy, grounding, tool use, escalation, latency, cost, safety and user experience. 

  • Shift left: engage at requirements and design so gaps, edge cases and testability issues are caught before code is written. 

  • Shift right: use production traces, evals and user feedback as part of the regression loop, so quality keeps improving after release. 

  • Establish deployment gates so AI features ship with evidence, not hope. 

Automation & Evaluation 

  • Build and maintain automated test suites across UI, API, contract, integration and end-to-end layers, running as a first-class part of CI/CD. 

  • Design LLM and agent evaluation harnesses using a layered approach: automated checks, LLM-as-judge scoring, rubric-based evaluation and targeted human review. 

  • Maintain golden datasets, regression suites and red-teaming scenarios that evolve with the product. 

  • Bring LLM quality metrics into the same unified reporting as functional, performance and security results. 

  • Drive automation-first: if a check can be automated, it should be. 

AI-Accelerated QA 

  • Use AI tooling (code-generation, LLM-based test generation, synthetic data, AI-assisted exploratory testing, self-healing test frameworks) to accelerate coverage and reduce manual toil. 

  • Practice prompt engineering for QA: constrain AI tools with precise specifications so they produce tests and triage artefacts worth keeping. 

  • Build or integrate AI-assisted triage — clustering failures, summarizing root causes, drafting repro steps and proposing fixes to engineers. 

  • Continuously evolve the QA tooling stack as the AI developer ecosystem matures. 

Reliability, Performance & Safety 

  • Run performance, load and chaos testing against agent workflows and backend services. 

  • Stress-test guardrails, prompt injection defences, tool-use restrictions and data boundaries. 

  • Cover accessibility, visual regression and security checks (SAST/DAST) as part of the standard test pipeline. 

  • Partner with security and compliance on safety reviews for client-facing AI features. 

Process & Collaboration 

  • Work day-to-day with backend, front-end, AI and platform engineers — quality is a team sport and you’re the coach. 

  • Feed defect patterns, evaluation results and production telemetry back into engineering and product priorities. 

  • Coach engineers on writing better tests, better evals and more observable code — and keep humans in the loop where AI judgement isn’t enough. 

Competencies and skillset we expect you to have to successfully perform your job: 

Experience 

  • 5+ years in QA / SDET / Test Engineering, including hands-on ownership of automation frameworks. 

  • Experience testing modern cloud-native applications (microservices, APIs, event-driven systems) in AWS and/or Azure. 

  • Exposure to testing AI, ML or agent-based systems — or a clear, demonstrable plan for how you’d go about it. 

Technical Expertise 

  • Strong proficiency in at least one automation language/stack (e.g., Python with pytest, TypeScript with Playwright, Java with JUnit/RestAssured). 

  • Comfort with API testing, contract testing and service virtualization. 

  • Familiarity with CI/CD pipelines (GitHub Actions, GitLab CI, Azure DevOps) and running tests as a first-class part of the pipeline. 

  • Solid understanding of observability tooling — logs, metrics, traces — and how to use it in testing and production monitoring. 

AI & Evaluation Skills 

  • Hands-on use of AI developer tools for test generation, code review and triage. 

  • Understanding of LLM evaluation concepts: reference-based vs reference-free metrics, LLM-as-judge, rubric-based scoring, regression harnesses and continuous evaluation. 

  • Awareness of AI-specific failure modes: hallucination, prompt injection, tool misuse, retrieval failures, context window issues and model drift. 

Soft Skills 

  • Relentlessly curious — you want to know why something failed, not just that it did.

  • Clear written communication; defect reports and evaluation summaries that engineers actually enjoy reading. 

  • Self-driven in a distributed, multi-timezone team. 

Preferred Qualifications 

  • Experience with performance testing tools (k6, Locust, JMeter). 

  • Familiarity with evaluation frameworks (e.g., OpenAI Evals, Ragas, DeepEval, Promptfoo, Langfuse). 

  • Exposure to security testing, accessibility testing and compliance-driven environments.


Why is Elixirr Digital the right next step for you? 

From working with cutting-edge technologies to solving complex challenges for global clients, we make sure your work matters. And while you’re building great things, we’re here to support you. 

Compensation & Equity: 

  • Individually tailored benefits package 

  • Performance bonus 

  • Employee Stock Options Grant 

  • Employee Share Purchase Plan (ESPP) 

  • Competitive compensation 

Flexibility & Balance: 

  • Flexible working hours and remote work 

  • Multi-sport card 

  • Full medical checkup 

Projects & Tools: 

  • Big clients and interesting projects 

  • Cutting-edge technologies 

Learning & Growth: 

  • Learning and development budget 

  • Growth and development opportunities 

People-Centered Perks: 

  • Referral bonus 

  • Anniversary bonus 

We don’t just offer a job - we create space for you to grow, thrive, and be recognized. 

 

Intrigued? Apply now! 

Senior QA Engineer (AI Solutions)

Job description

Senior QA Engineer (AI Solutions)

Personal information
Professional data
Details