- Career Center Home
- Search Jobs
- Senior Software Engineer, Data Center Infrastructure Management Lifecycle
Results
Job Details
Explore Location
Google
Sunnyvale, California, United States
(on-site)
Posted
11 hours ago
Google
Sunnyvale, California, United States
(on-site)
Job Type
Full-Time
Senior Software Engineer, Data Center Infrastructure Management Lifecycle
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Senior Software Engineer, Data Center Infrastructure Management Lifecycle
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Description
Minimum qualifications:- Bachelor's degree or equivalent practical experience.
- 5 years of experience in software development.
- 3 years of experience in one or more general-purpose programming languages (e.g., C , Python, Go).
- 3 years of experience with distributed systems design and development.
Preferred qualifications:
- Master's degree or PhD in Computer Science or a related technical field.
- 3 years of experience with hardware health monitoring, diagnostics, and repair automation systems.
- 3 years of experience working with monitoring and alerting systems (e.g., Monarch, Automon).
- 3 years of experience with large-scale data collection and analysis pipelines.
- Familiarity with data center infrastructure and operations.
About the job
The Data Center Infrastructure Management (DCIM) Lifecycle team operates one of the largest-scale monitoring systems at Google, reading telemetry from millions of devices in every Google data center. Our issues include managing the rapid growth and diversification of the Google fleet and hardware, new use cases for critical monitoring of third-party facilities, and retiring technical debt. Google is bringing back tape libraries to our data centers in order to support various critical requirements including new cold storage tier, better TCO, contingency for HDD/SSD shortage due to unprecedented AI/ML capacity demand. This role is to design and delivery tape health at Google scale for reliability.
In this role, you will work with your teammates to design, code, and put into production very large-scale distributed monitoring systems and work with your team and partner teams to enable new use cases for large-scale telemetry gathering. You will also create various system monitoring dashboards, defining service level objectives (SLOs), documentation and playbooks. You will have the opportunity to take onsite trips to one or more of Google's data centers each year to work with new systems and data center technical staff in person.The US base salary range for this full-time position is $166,000-$244,000 bonus equity benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.
Responsibilities
- Design, develop, and maintain software services for collecting and analyzing telemetry data from tape libraries, drives, and robotic components.
- Implement algorithms and rules to detect, diagnose, and predict hardware failures.
- Integrate tape health systems with Google's data center health monitoring infrastructure (e.g., system health, network doctor) and automated repair workflows (e.g., surgeon, silk roads).
- Collaborate with hardware engineers and vendors to understand failure modes and improve diagnostic capabilities.
- Develop dashboards and tools to provide visibility into the health and status of the tape hardware fleet. Participate in the full software development lifecycle, including requirements gathering, design, coding, testing, deployment, and operation.
${qualifications}${responsibilities}
Requisition #: 139966066548712134
pca3lyuhf
Job ID: 82706191
Jobs You May Like
Median Salary
Net Salary per month
$8,436
Median Apartment Rent in City Center
(1-3 Bedroom)
$3,330
-
$5,403
$4,367
Safety Index
76/100
76
Utilities
Basic
(Electricity, heating, cooling, water, garbage for 915 sq ft apartment)
$100
-
$500
$255
High-Speed Internet
$50
-
$100
$65
Transportation
Gasoline
(1 gallon)
$4.66
Taxi Ride
(1 mile)
$3.86
Data is collected and updated regularly using reputable sources, including corporate websites and governmental reporting institutions.
Loading...
