Location: Remote
Reports To: Chief Executive Officer (CEO)
Employment Type: Full-Time
About Ask Sage:
Founded by Nic Chaillan, former Chief Software Officer of the Air Force and Space Force, Ask Sage is the leading Generative AI platform that augments the velocity of government and commercial teams with dozens of use cases from coding to cybersecurity to acquisition to data analysis and much more. Our FedRAMP High and DoD IL5 accredited cutting-edge technology enables teams to focus on strategic initiatives while we take care of the heavy lifting. We are seeking a highly skilled and experienced Site Reliability Engineer (SRE), with deep Kubernetes experience, to join our team and ensure the reliability, performance, and scalability of our
software, websites, and applications.
Position Overview:
The Site Reliability Engineer (SRE) will be responsible for ensuring the reliability, performance, and scalability of Ask Sage's software, websites, and applications. This role requires a combination of software engineering and systems administration skills to monitor, control, and automate systems. The ideal candidate will have a deep understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance. This position plays a critical role in maintaining the overall health and efficiency of our platform.
Key Responsibilities:
System Monitoring and Maintenance:
- Monitor the performance and reliability of Ask Sage's Kubernetes clusters, software, websites, and applications.
- Automate routine maintenance tasks to ensure system stability and performance.
Incident Response and Troubleshooting:
- Respond to and resolve incidents in a timely manner, minimizing downtime and impact on users.
- Conduct root cause analysis to identify and address underlying issues.
- Develop and implement strategies to prevent future incidents and improve system resilience.
Automation and Infrastructure Management:
- Design, build, and maintain automated systems and processes to improve efficiency and reduce manual intervention.
- Manage cloud infrastructure, including provisioning, scaling, and optimizing resources.
- Collaborate with development teams to ensure seamless deployment and integration of new features and updates.
Performance Optimization:
- Analyze system performance and identify areas for improvement.
- Implement performance tuning and optimization techniques to enhance system efficiency.
- Collaborate with cross-functional teams to ensure optimal performance of all components.
Security and Compliance:
- Ensure compliance with security best practices and industry standards.
- Implement and maintain security measures to protect systems and data.
- Conduct regular security audits and vulnerability assessments.
Documentation and Reporting:
- Maintain accurate and up-to-date documentation of systems, processes, and procedures.
- Generate and analyze reports on system performance, incidents, and other key metrics.
- Provide regular updates to management and stakeholders on system health and performance.
Continuous Improvement:
- Identify opportunities for improving system reliability, performance, and scalability.
- Stay up-to-date with industry trends and best practices in site reliability engineering.
- Participate in training and development opportunities to enhance skills and knowledge.
Qualifications:
- Deep expertise of Kubernetes and containers.
- Strong understanding of cloud infrastructure, automation tools, and best practices for maintaining high availability and performance.
- Experience with monitoring and logging tools such as Loki, Grafana.
- Minimum of 3 years of experience in site reliability engineering, Kubernetes administration, or a related role.
- Excellent problem-solving skills and attention to detail.
- Strong communication and interpersonal skills, with the ability to work effectively with cross-functional teams.
Why Join Ask Sage:
- Opportunity to work with cutting-edge AI technology that is transforming the industry.
- Collaborative and innovative work environment that values creativity and initiative.
- Competitive salary and benefits package, including 100% coverage for health plans, 401K, flexible PTO, and stock options.
- Chance to make a significant impact on the company's growth and success.
To apply for this position, send your resume and cover letter to
jobs@asksage.ai