THE POSITION IS OPEN IN WROCLAW, LUBLIN, and SOFIA. RELOCATION OPPORTUNITIES AVAILABLE.
Our client is one of the biggest online retailers worldwide with an annual revenue of ?1 billion. Over the years we helped the client develop web-portals, mobile apps, delivery control systems, staff management tools, data storage and much more. The systems we've built together are in operation 24/7, contributing to the client's success.
Site Reliability Engineering is a new role, first introduced by Google, that combines the skills of developers and ops to deliver more reliable, scalable software. The goal is to analyze a diverse set of applications (primarily built using Java, Oracle, AWS, Google Cloud services and a number of other technologies) and bind them into a reliable self-healing suite, working within defined reliability requirements. This requires proactive work to ensure observability, analyze potential bottlenecks and suggest their fixes before they become a production incident.
Required Skills and Experience
Good understanding of Java, hands-on experience in troubleshooting nontrivial problems like multithreading race conditions, memory leaks, cache issues, etc.
Good understanding of high load systems development practices, reliability measuring, failover processes
Deep understanding of Unix/Linux systems administration
Ability to conduct post mortems and learn from past failures