Site Reliability Engineer
VGW is a fast-growing technology company and creator of market-leading online social games. With offices across Australia, San Francisco, Toronto, Malta and the Philippines we are on a mission to be the biggest gaming company in the world!
Due to major growth we are expanding our Engineering team and currently looking for a Site Reliability Engineer to join the team.
What you should know about our Engineering team:
- We make room to do things the right way, rather than hacking stuff in
- You own your projects: you build it, you ship it, you run it
- We were born in the cloud: practices and principles like CI/CD, IaC, O11y and CyberSec are part of what we do
When it comes to Learning and Development:
- Learning is part of our fabric. We have a world-class Engineering Learning and Development program and we are passionate about career development.
- Thought Leaders regularly give tailored talks and workshops for our team
- We are proactive in the community attending many local and international conferences; and when covid hit we held our own day-long conference: Techtonic
- We have both a technical and management track for career progression and promotions
- We have a library of books and videos with good content and encourage staying on top of new practices.
In terms of our tech stack, we’re container-based, running on ECS Fargate supported by Amazon Aurora, CloudFront and S3 for our front-ends. Our infrastructure is provisioned with Terraform and we use a mix of CircleCI, Github Actions and Terraform Workspaces for our CI/CD.
- Investigating production incidents and conducting blameless postmortems to identify areas of improvement
- Providing cool and calm guidance as an Incident Commander during major incidents
- Providing guidance and hands-on support in building VGW’s cloud infrastructure
- Using your unique voice and technical skills to drive improvements in processes and policies with a focus on reliability and stability
- Collaboratively working with SRE teams to foster a culture of continuous improvement, driving the maturity of the SRE discipline forward
- Identifying new approaches and solutions to eliminate toil in engineering teams
- Participating in project kick-off meetings, code reviews and post-incident review meetings
Required Skills & Experience:
- Experience working as a DevOps Engineer, Systems Administrator, SRE or a related field and ready to take the next step
- Strong knowledge of Google SRE
- Proficiency with Terraform (Infrastructure as Code Tool)
- Knowledge of CI/CD tools and techniques
- Experience with a major cloud provider (AWS, Azure or GCP)
- Experience with the implementation of OpenTelemetry
- Understanding of resilience techniques and patterns
Nice to have:
- Experience working with a microservice architecture
- Understanding of Unix/Linux operating systems
- Knowledge of networking protocols such as TCP, HTTP/2, WebSockets, etc.
- Knowledge of Service Level Objectives, Service Level Indicators and Error Budgets
VGW has been disrupting the online gaming world since 2010 and we're only getting started. We've assembled an incredibly talented global team who bring their passion, energy and expertise to build games that people love.
At VGW, we have a modern approach to getting work done and a focus on creating an environment where amazing people can do amazing work. That means giving you the flexibility you need, providing spaces that will keep you comfortable and finding opportunities for you to keep learning and growing.
Find out more at www.vgw.co
If you want to join a team that does things differently apply today and we look forward to seeing what you can bring to our team.
Get jobs like this directly to your Inbox
Create Profile and Get Noticed