- Career Center Home
- Search Jobs
- Staff Software Engineer, Network Health
Results
Job Details
Explore Location
Google
Sunnyvale, California, United States
(on-site)
Posted
13 hours ago
Google
Sunnyvale, California, United States
(on-site)
Job Type
Full-Time
Staff Software Engineer, Network Health
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Staff Software Engineer, Network Health
The insights provided are generated by AI and may contain inaccuracies. Please independently verify any critical information before relying on it.
Description
Minimum qualifications:- Bachelor's degree or equivalent practical experience.
- 8 years of experience in software development.
- 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
- 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), ML infrastructure, or specialization in another ML field.
- 5 years of experience with ML design and ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).
- Experience integrating generative AI tools or LLM interfaces into workflows.
Preferred qualifications:
- Master's degree or PhD in Engineering, Computer Science, or a related technical field.
- 8 years of experience with data structures and algorithms.
- 3 years of experience in a technical leadership role leading project teams and setting technical direction.
- Experience with any of the following: SQL Pipelines, Plx Scripts, Generative AI Agents.
- Track record of leading complex infrastructure projects.
- Ability to influence technical direction across a partner teams (repair infrastructure, network, machines, all coexist together), and improve engineering practices.
About the job
Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.
Platforms Infrastructure Engineering operates within the Google Cloud umbrella. We provide the AI/ML infrastructure on which Google runs - both internally and externally.
Large-scale ML training requires a huge infrastructure footprint, all of which is connected by a equally large and dense networking infrastructure. Join us to directly enable this next generation of Google's AI infrastructure. Mission is finding innovative ways to increase availability, reduce risk to production traffic, and more efficiently operate the network that enables large-scale training and serving.
As the lead for this team, you will set the long-term technical roadmap to improve safety, increase observability, improve automated remediation, ensuring that nearly all of Google's customers run with maximum availability possible.
The AI and Infrastructure team is redefining what's possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving force behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
Individual pay is determined by factors including job-related skills, experience, and relevant education or training.
US: $207000 - $301000 (USD) 20% bonus target equity benefits
Learn more about benefits at Google.
Responsibilities
- Define the long-term goal for repair automation of AI/ML infrastructure, focusing on achieving goals through multiple parallel programs.
- Lead and participate in the design of agentic diagnostic systems that utilize Generative AI to automate diagnoses for next-gen networks.
- Work with platform teams to integrate new hardware platforms into the automation ecosystem, driving the qualification and repair workflows required for global fleet turn-up.
- Lead critical safety initiatives, such as automated anomaly detection, to protect fleet health and capacity.
- Mentor a team of junior and executive engineers and influence engineering practices across the broader infrastructure organization to drive consistency in automation and safety standards.
${qualifications}${responsibilities}
Requisition #: 123298905428239046
pca3lyuhf
Job ID: 84931330
Jobs You May Like
Median Salary
Net Salary per month
$8,512
Median Apartment Rent in City Center
(1-3 Bedroom)
$3,330
-
$5,403
$4,367
Safety Index
76/100
76
Utilities
Basic
(Electricity, heating, cooling, water, garbage for 915 sq ft apartment)
$130
-
$500
$255
High-Speed Internet
$45
-
$105
$65
Transportation
Gasoline
(1 gallon)
$4.77
Taxi Ride
(1 mile)
$3.27
Data is collected and updated regularly using reputable sources, including corporate websites and governmental reporting institutions.
Loading...
