Want to help out in the development of our Loyalty products in Google cloud for the new generation of Hii Retail solutions and make your mark on how to optimize flows by enabling speed and efficiency?
We are now looking for an experienced, passionate and structured engineer to make sure that the services we develop are reliable. In the role as Senior Site Reliability Engineer you will bridge the gap between Operations and Development aiming to increase speed and efficiency by leveraging automation tools. As you will have a part in the responsibility of the production environment you will function as an enabler and multiplier with focus on helping the development team to run more efficiently. You will also have the lead in automating manual processes as you apply DevOps practices with the ambition to improve our emergency response process.
The team that you will be working with is an experienced development team that delivers services and products within the retail loyalty domain. The team consists of twelve people distributed across four countries. The role involves a lot of collaboration, with both internal and external stakeholders. We have a SRE focus group within the company that you will work closely with, share ideas and implement those ideas in the teams across the company.
In close collaboration with the development teams you’ll define reliability metrics and offer education and guidance on SRE practices. Your mission is to define best practices to ensure that software releases are consistent and repeatable with the purpose of shortening the time-to-market. In collaboration with product owners receive customer issues and allocate resources based on each individual business case.
What you will be doing
- Increase, maintain and measure reliability of services
- Real-Time troubleshooting of critical application incidents, manage real-time escalations and on point for ensuring escalations procedures are in process and are driven to resolution
- Critical situation management: Handle stressful situations, such as initiating emergency conference bridge calls and sending quick and accurate outage notifications
- Practice and lead the team with Site Reliability Engineering and DevOps mindset and solve problems through automation and innovation
- Apply DevOps practices and automate manual processes to drive down operational overhead
- Define reliability metrics (SLO/SLI), support product owners and legal team to define SLAs, measure SLOs and apply error budget for releases
- Ensure that SLOs are met
- Analyze, assess and keep track of infrastructure capacity and recommend cost optimizations
- Define observability matrices and guide SREs for implementation of the same
- Define best practices to ensure software releases are consistent and repeatable to shorten time-to-market
- Overall responsible for keeping CI/CD functioning as expected, maintenance and upgrade
- Work closely with development teams during the development process offering education and guidance on SRE practices
What you will do the first 3-6 months in the role
The first months will mainly consist of competence transfer and competence development. You will need to familiarize yourself with our products, customers and processes.
You will quickly get to know the team and your SRE colleagues in the company. The team will support you and help you get into the role of SRE quickly.
Our Tech Stack
Google Cloud Platform(Cloud Spanner, Cloud run, Pub-Sub), Git, Github actions, Docker, Kubernetes, code-as-infrastructure, Spinnaker, Postgresql, Java, NodeJS, Jira, Confluence.
Who are you?
A solution oriented and self-driven person with a proactive attitude that will bring a significant value to our team at Extenda Retail. As you will work with internal and external stakeholders you’ll need excellent verbal and written communication skills together with a passion for technology and a growth mindset to always learn and develop.
We look for ambitious people with a significant drive, fueled by curiosity to innovate. You should be passionate about your job and enjoy a fast paced environment with the ambition to go above and beyond to produce the best possible results.
- Proven track record of working in complex enterprise production application development and support efforts adhering to a mix of DevOps & SRE frameworks
- Experience in any of cloud infrastructure provider, GCP preferably
- Have a good understanding of modern web architectures, system design and software engineering principles, and understand how to apply them to design scalable and robust solutions
- Experience with a monitoring tool
- Experience with CI/CD tools (DevOps experience)
Nice to have:
- Experience as a technical leader
- Ability to lead a technical group independently
- Strong customer advocate with excellent written and verbal communication skills
Extenda Retail is a leading software provider to global retailers. We help major retail chains with their digital business transformation by delivering innovative solutions and services that enable our clients to lead the retail technology revolution. We are perfectly positioned to aggressively go after new business on a global scale. We are driven by our mission to simplify shopping and we are committed to making sure our existing and future clients thrive in the highly competitive and rapidly changing retail industry.
If this makes you excited, then we'd love for you to join us.