Zum Hauptinhalt wechseln
Suchen

Platform Reliability Engineer

Ort Chennai, State of Tamil Nādu, Indien Anzeigen-ID R-207062 Veröffentlichungsdatum 11/11/2024

Job Title: Platform Reliability Engineer

Career Level - E

Introduction to role:

Join us as a Platform Reliability Engineer in our Commercial IT – SSD, Data, Analytics and AI Platform Success Team. Your primary focus will be to ensure the stability, performance, and reliability of our Data, Analytics, and AI systems. You will bridge the gap between development and operations by generating insights into sub-optimal processes and optimization opportunities. This role offers an exciting opportunity to integrate Agile, Lean and SaFe practices within monitoring and observability initiatives and to continuously improve delivery cycle times.

Accountabilities:

As a Platform Reliability Engineer, you will be responsible for the evaluation, selection, and deployment of monitoring & observability technologies. You will manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices. You will collaborate with DevOps, CriticalOps and IT leadership teams to understand system requirements and design effective monitoring strategies. You will also develop and implement monitoring solutions for infrastructure, applications, and services.

AstraZeneca is a global, innovation-driven biopharmaceutical business with a primary focus on the discovery, development and commercialization of prescription medicines. Our purpose as a company is to push the boundaries of science to deliver life-changing medicines and greater efficiency & innovation in healthcare. 

As science moves forward, technology needs to keep pace. AstraZeneca has created a world class IT organization by radically reinventing the current IT operating model and organization design; supplier ecosystem optimization and insourcing; establishment of a network of global Technology Centers; significant Infrastructure and Technology transformation; cultural change and risk management.

As an individual contributor within the Commercial IT – SSD, Data, Analytics and AI Platform Success Team the Platform Reliability Engineer, responsibilities include the following:

·Ensuring the stability, performance and reliability of Data, Analytics and AI systems by implementing and maintaining robust monitoring and observability solutions

·Primary focus will be to design, deploy, and manage monitoring tools and practices that provide insights into the health and performance of our data infrastructure and analytics processes

·Help bridge gap between development and operations by generating insights into sub-optimal processes and optimization opportunities.

·Maintaining working knowledge of platform architecture and business acumen

·Ability to integrate Agile, Lean and SaFe practices within monitoring and observability initiatives and to continuously improve delivery cycle times

·Exploring and implementing new ways to automate systems - Designing and testing automation processes, identifying quality issues and supporting IT platform teams to eliminate defects and errors with product and platform development.

Experience leveraging AIOps capabilities to uplift existing production operations

Technology/Tool Management

  • Responsible for the evaluation, selection, and deployment of monitoring & observability technologies (internal or market available) suitable for the organization’s needs – this includes creation of effective business case(s) to influence investment and innovation
  • Manage and maintain monitoring infrastructure, ensuring it aligns with industry best practices

Monitoring & Observability Practice Management

  • Collaborate with DevOps, CriticalOps and IT leadership teams to understand system requirements and design effective monitoring strategies that align with organizational goals and objectives
  • Establish key metrics and KPIs that enable insights and analytics to achieve data-driven continuous improvement backlog
  • Provide training and support to other teams on using monitoring tools effectively
  • Create and maintain documentation for monitoring and observability practices, including standard operating procedures and best practices
  • Stay abreast of industry trends, emerging technologies, and best practices related to monitoring and observability platforms

Monitoring & Observability Implementation & Operations

  • Develop and implement monitoring solutions for infrastructure, applications, and services
  • Design and configure alerting mechanisms to deter and respond to potential issues proactively
  • Use monitoring tools to identify and troubleshoot issues in real-time
  • Collaborate with other teams to resolve incidents promptly and prevent reoccurrence
  • Analyze monitoring data to identify performance bottlenecks and areas for improvement
  • Work with development and operations teams to optimize system performance based on monitoring insights
  • Implement automation scripts and workflows to streamline monitoring processes
  • Integrate monitoring solutions with existing frameworks for seamless operation
  • Identify and evaluate “self-healing” opportunities based on production issue trend analysis to inform AIOps roadmap

Essential

  • Degree level education in computer science, information technology, or a related field
  • Proven experience as a monitoring and observability engineer or a similar role
  • Proficient in developing monitoring capabilities and configuring integration with tools such as Prometheus, Grafana, Splunk, SumoLogic, DataDog, DynaTrace, etc.
  • Strong scripting skills (e.g., Python) for automation in data environments
  • Familiarity with logging, tracing, and APM (Application Performance Monitoring) solutions

Desirable

  • Customer engagement experience
  • Knowledge of data processing frameworks (e.g. Apache Spark) and data storage solutions (e.g. data lakes, warehouses)
  • Experience with data orchestration tools (e.g. Apache Airflow)
  • Understanding of data lineage and metadata management

When we put unexpected teams in the same room, we unleash bold thinking with the power to inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge perceptions. That’s why we work, on average, a minimum of three days per week from the office. But that doesn't mean we’re not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.

Why AstraZeneca?

Join us at a crucial stage of our journey in becoming a digital and data-led enterprise. Make the impossible possible by building partnerships and ecosystems, creating new ways of working and driving scale and speed to deliver exponential growth. Focused and committed, and backed with the investment to succeed, we're driving cross-company change to disrupt the entire industry. Our work unlocks the potential of science. We optimise and revolutionise AstraZeneca by maximising efficiencies and finding new ways to drive productivity. From automation to data simplification.

Ready to make a difference? Apply today and be part of a team that has the backing to innovate, disrupt an industry and change lives.



50200021 E ITAE

Mitglied in unserer Talentgemeinde werden

Melden Sie sich an, um als erste(r) die Job-Updates zu erhalten.

InteressensschwerpunkteErfassen Sie die ersten Buchstaben einer Kategorie, und treffen Sie dann eine Auswahl aus den Vorschlägen. Erfassen Sie die ersten Buchstaben eines Ortes, und treffen Sie dann eine Auswahl aus den Vorschlägen. Klicken Sie danach auf „Hinzufügen“, um Ihre Benachrichtigung zu erstellen.

Glassdoor logo Rated four stars on Glassdoor

Großartige Kultur, großartige Arbeitsaufgaben, unterstützendes Management. Rotationsmöglichkeit innerhalb des Unternehmens. Sie schätzen Integration und Vielfalt.