Вакансія закрита

Ймовірно, роботодавець знайшов резюме потрібного кандидата в базі резюме rabota.ua
А ваше резюме є в базі?
Створити резюме
Увага! Це - архів вакансій. Вакансії, які Ви тут знайдете, неактуальні і знаходяться тут виключно в ознайомчих цілях.
03.07.2019

Senior Site Reliability Engineer, Online Retailer

DataArt
IT - разработка ПО

    Our client is one of the biggest online retailers worldwide with an annual revenue of 1 billion. Over the years we helped the client develop web-portals, mobile apps, delivery control systems, staff management tools, data storage and much more. The systems we've built together are in operation 24/7, contributing to the client's success.

     

    Site Reliability Engineering is a new role, first introduced by Google, that combines the skills of developers and ops to deliver more reliable, scalable software. The goal is to analyze a diverse set of applications (primarily built using Java, Oracle, AWS, Google Cloud services and a number of other technologies) and bind them into a reliable self-healing suite, working within defined reliability requirements. This requires proactive work to ensure observability, analyze potential bottlenecks and suggest their fixes before they become a production incident.

     

    Responsibilities

     

    • Analyze and improve the availability, latency, performance, and efficiency of the applications
    • Proactive support of production applications (both in-office and out of hours) across a range of domains, these are mainly written in Java and use Oracle databases.
    • Improve the monitoring and alerting of the applications
    • Capacity planning and provisioning
    • Improve and standardize build pipelines, identify and reduce any areas of manual toil through automation.
    • Consult in areas of reliability and scalability for the development of new applications.
    • Work together with teams in other departments to find solutions
    • Conduct periodic on-call duties

     

     

    Required Skills and Experience

     

    • Expertise in designing, analyzing and troubleshooting large-scale distributed systems.
    • Good understanding of cloud technologies
    • Experience with algorithms, data structures, complexity analysis and software design.
    • Good understanding of Java, hands-on experience in troubleshooting nontrivial problems like multithreading race conditions, memory leaks, cache issues, etc
    • Good understanding of SQL, experience with query optimization and performance tuning
    • Good understanding of high load systems development practices, reliability measuring, failover processes
    • Understanding of microservices architecture, containers, orchestration frameworks
    • Deep understanding of Unix/Linux systems administration
    • Knowledge and understanding of network theory (MAC addresses, IP packets, DNS, OSI layers, and load balancing).
    • Ability to get to the root cause of problems and facilitate this approach within the team
    • Ability to conduct post mortems and learn from past failures.
    • Driving a constant measurable system improvement process
    • Good English communication and interpersonal skills

     

     

    DataArt offers:

     

    • Professional Development:
      - Experienced colleagues who are ready to share knowledge;
      - The ability to switch projects, technology stacks, try yourself in different roles;
      - More than 150 workplaces for advanced training;
      - Study and practice of English: courses and communication with colleagues and clients from different countries;
      - Support of speakers who make presentations at conferences and meetings of technology communities.
    • The ability to focus on your work: a lack of bureaucracy and micromanagement, and convenient corporate services;
    • Friendly atmosphere, concern for the comfort of specialists;
    • Flexible schedule and the ability to work remotely;
    • The ability to work in any of our development centers.