Let’s start with the role
Kaizen Gaming uses a variety of systems and stacks to deliver its products and as a member of this team, you will work closely with the other software engineering teams with a focus on software development and infrastructure design providing the expertise in performance, stability and scalability.
We are looking for an experienced Site Reliability Engineer that brings a broad set of technical skills and achievements, development and automation focused mindset to solving problems, who is eager to tackle a few of technology’s greatest challenges and make an impact on thousands, if not millions, of users. Our SRE team consists of experienced engineers who collaborate effectively and work on large-scale applications that serve real-time content to thousands of connected users, within a fast-paced growing business.
As a Site Reliability Engineer you will:
- Together with a team of engineers, you will enable and enhance the day-to-day operational workflows of critical applications and services in a 24x7x365 environment located in cloud and physical data centres;
- Continuously improve application observability to ensure the uptime and reliability of our applications and infrastructure;
- Utilize a wide variety of open source technologies to create fault-tolerant, scalable and secure high-performance services and pipelines on a global scale.
What you’ll bring
- 3-5 years experience building scalable production environments;
- Experience with, or understanding of, source code control systems, versioning, branching and merging, configuration, build management, artefact repos, automated build tools, automated testing frameworks and automated deployment frameworks;
- Extensive working experience on Continuous Integration and Continuous Delivery procedures and tools (eg. Jenkins, Gitlab CI), since you will be maintaining and supporting automated build pipelines;
- Strong Scripting skills (BASH, Powershell, Python, Go, Ruby etc.);
- Strong experience using Docker containers and container orchestration tools (preferably Kubernetes);
- Experience with one or more Infrastructure as code tools such as Ansible, Chef, Terraform, etc.;
- Ability to work as part of a distributed team;
- Experience in monitoring and metrics systems (Prometheus, Logstash, Grafana);
- Programming skills in Java or .Net.
Nice to have
- Familiarity with database technologies (MSSQL/PostgreSQL);
- Hands-on experience with messaging technologies like RabbitMQ, Apache Kafka configuration and troubleshooting.