Stellenangebote suchen

Platform Reliability Engineer

Ort Chennai, State of Tamil Nādu, Indien Anzeigen-ID R-207062 Veröffentlichungsdatum 24/02/2025

Job Title: Platform Reliability Engineer

Career Level - E

Introduction to role:

Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between development and operations by generating insights into sub-optimal processes and optimization opportunities. This role offers an exciting opportunity to integrate Agile, Lean and SaFe practices within monitoring and observability initiatives and to continuously improve delivery cycle times.

Accountabilities:

As a Platform Reliability Engineer, you will be responsible for the evaluation, selection, and deployment of monitoring & observability technologies. You will manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices. You will collaborate with DevOps, CriticalOps and IT leadership teams to understand system requirements and design effective monitoring strategies. You will also develop and implement monitoring solutions for infrastructure, applications, and services.

AstraZeneca is a global, innovation-driven biopharmaceutical business with a primary focus on the discovery, development and commercialization of prescription medicines. Our purpose as a company is to push the boundaries of science to deliver life-changing medicines and greater efficiency & innovation in healthcare.

As science moves forward, technology needs to keep pace. AstraZeneca has created a world class IT organization by radically reinventing the current IT operating model and organization design; supplier ecosystem optimization and insourcing; establishment of a network of global Technology Centers; significant Infrastructure and Technology transformation; cultural change and risk management.

As an individual contributor within the Commercial IT – SSD, Data, Analytics and AI Platform Success Team the Platform Reliability Engineer, responsibilities include the following:

·Ensuring the stability, performance and reliability of Data, Analytics and AI systems by implementing and maintaining robust monitoring and observability solutions

·Primary focus will be to design, deploy, and manage monitoring tools and practices that provide insights into the health and performance of our data infrastructure and analytics processes

·Help bridge gap between development and operations by generating insights into sub-optimal processes and optimization opportunities.

·Maintaining working knowledge of platform architecture and business acumen

·Ability to integrate Agile, Lean and SaFe practices within monitoring and observability initiatives and to continuously improve delivery cycle times

·Exploring and implementing new ways to automate systems - Designing and testing automation processes, identifying quality issues and supporting IT platform teams to eliminate defects and errors with product and platform development.

Experience leveraging AIOps capabilities to uplift existing production operations

Technology/Tool Management

Responsible for the evaluation, selection, and deployment of monitoring & observability technologies (internal or market available) suitable for the organization’s needs – this includes creation of effective business case(s) to influence investment and innovation
Manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices

Monitoring & Observability Practice Management

Collaborate with DevOps, CriticalOps and IT leadership teams to understand system requirements and design effective monitoring strategies that align with organizational goals and objectives
Establish key metrics and KPIs that enable insights and analytics to achieve data-driven continuous improvement backlog
Provide training and support to other teams on using monitoring tools effectively
Create and maintain documentation for monitoring and observability practices, including standard operating procedures and best practices
Stay abreast of industry trends, emerging technologies, and best practices related to monitoring and observability platforms

Monitoring & Observability Implementation & Operations

Develop and implement monitoring solutions for infrastructure, applications, and services
Design and configure alerting mechanisms to deter and respond to potential issues proactively
Use monitoring tools to identify and troubleshoot issues in real-time
Collaborate with other teams to resolve incidents promptly and prevent reoccurrence
Analyze monitoring data to identify performance bottlenecks and areas for improvement
Work with development and operations teams to optimize system performance based on monitoring insights
Implement automation scripts and workflows to streamline monitoring processes
Integrate monitoring solutions with existing frameworks for seamless operation
Identify and evaluate “self-healing” opportunities based on production issue trend analysis to inform AIOps roadmap

Essential

Degree level education in computer science, information technology, or a related field
Proven experience as a monitoring and observability engineer or a similar role
Proficient in developing monitoring capabilities and configuring integration with tools such as Prometheus, Grafana, Splunk, SumoLogic, DataDog, DynaTrace, etc.
Strong scripting skills (e.g., Python) for automation in data environments
Familiarity with logging, tracing, and APM (Application Performance Monitoring) solutions

Desirable

Customer engagement experience
Knowledge of data processing frameworks (e.g. Apache Spark) and data storage solutions (e.g. data lakes, warehouses)
Experience with data orchestration tools (e.g. Apache Airflow)
Understanding of data lineage and metadata management

When we put unexpected teams in the same room, we unleash bold thinking with the power to inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge perceptions. That’s why we work, on average, a minimum of three days per week from the office. But that doesn't mean we’re not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.

Why AstraZeneca?

Join us at a crucial stage of our journey in becoming a digital and data-led enterprise. Make the impossible possible by building partnerships and ecosystems, creating new ways of working and driving scale and speed to deliver exponential growth. Focused and committed, and backed with the investment to succeed, we're driving cross-company change to disrupt the entire industry. Our work unlocks the potential of science. We optimise and revolutionise AstraZeneca by maximising efficiencies and finding new ways to drive productivity. From automation to data simplification.

Ready to make a difference? Apply today and be part of a team that has the backing to innovate, disrupt an industry and change lives.

Ich möchte mich bewerben

Platform Reliability Engineer

Accountabilities:

Experience leveraging AIOps capabilities to uplift existing production operations

Mitglied in unserer Talentgemeinde werden

About Us

Life at AstraZeneca

Inclusion & Diversity