- Software Development
Our purpose is to serve the nation with the single most trusted and capable health information network, built to increase patient safety, lower costs and ensure quality care.
What You’re Like
You have a relentless desire to get stuff done. A true champion of the collective objective, you’re energized by applying your expertise with a group of talented people to achieve something important, together. You’re curious and passionate about new technologies and cutting-edge innovation, and you enjoy pulling together complex pieces of a puzzle to deliver a powerful and meaningful end-product. You’re a translator between people and teams that don’t use the same lingo. You don’t settle for short-term gratification—you’re into long-haul, incremental efforts that require endurance, patience and next-level collaboration.
What We’re Like
Surescripts Network Technology and Operations (NT&O) is comprised of smart people who love to work towards a common goal, often delivering an innovative, industry-leading solution to the healthcare marketplace. We pride ourselves on quality work that’s grounded in complete transparency and accountability. When a project goes off the rails—and they do from time to time—we rally around each other to fix it and move on. While our gratification is often the result of a months-long effort, we never tire of delivering a huge result that has an exponentially positive impact on the healthcare system—whether it’s quality, cost or patient safety.
OK, But Here’s What It’s Really Like
Working at Surescripts NT&O, your thinking cap will always be on. You’ll be challenged to make sure that disparate, cross-functional pieces come together to create a desired result. You might swarm around some work if you think a milestone is about to be missed. You’ll work to quickly understand diverse technologies, and establish and maintain relationships with groups who see and talk about things differently.
Surescripts has an opening for a Senior Site Reliability Engineer to join our team. As part of SRE team, take on responsibility for service availability and performance in Production environments. Serve as a subject matter expert on the capabilities and limits of the multi-data center production infrastructure
Define best practices promoting service reliability and fault-tolerance. Collaborate with the Software Development teams to ensure best practices are part of the design.
Develop and automate emergency recovery procedures, deployment schedules, post-maintenance validation, and other operational activities
Design and implement innovations that improve service reliability, infrastructure resiliency and security, and data availability.
Serve as a subject matter expert on all matters related to the service operations and a first level of escalation for any issues. Troubleshoot and provide root cause analysis for issues spanning code, network, database and systems components.
Collaborate with Product and Software Development teams to define Service Level Agreements (SLAs), Objectives (SLOs) and Indicators (SLIs)
Collect SLI metrics and establish monitoring based on SLO thresholds and other product requirements
Develop product specific reliability requirements to support SLOs.
Define infrastructure requirements and architecture. Ensure the infrastructure meets performance and capacity requirements.
Understand application dependencies, review dependency handling and health checks. Evaluate whether the dependency reliability is adequate to meet SLOs
Ensure service availability during software upgrades, and infrastructure and database maintenance.
Maintain services in production environments by measuring and monitoring availability, latency, and overall system health
Provide technical leadership and mentoring to other members of SRE team
Participate in on-call rotation
Bachelor degree in computer science, information sciences or related field or equivalent experience
Ability to analyze network traces and troubleshoot application performance problems
Ability to conceptualize a distributed service, it’s dependencies and the transactional flow
Experience with Unix/Linux and Windows operating system administration and networking architecture
Experience providing technical leadership and architectural guidance to Software Development teams.
5+ years proven development skills in one or more programming languages: Python, Java, Go, Ruby, shell scripting or similar
5+ years of software development, automation or infrastructure as code experience
7+ years proven development skills in one or more programming languages: Python, Java, Go, Ruby, shell scripting or similar
7+ years of software development, automation or infrastructure as code experience
Cloud infrastructure as code experience, e.g., Terraform, CloudFormation
Experience with configuration management tools Ansible, Chef, Puppet, Salt, and application schedulers like Kubernetes, Nomad, DockerSwam.
Experience monitoring/supporting Kafka, IBM MQ.
Experience querying SQL and No SQL databases. Familiarity with Oracle, Hadoop or Cassandra database architecture.
Experience building CI/CD tools (Jenkins, Teamcity) for a production application in an enterprise environment
Demonstrated ability to triage processing bottlenecks
Experience with monitoring systems: Influx, Splunk, Zenoss, AppDynamics or similar
Experience troubleshooting certificate issues and PKI infrastructure
Surescripts is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate on the basis of race, color, religion, age, national origin, ancestry, disability, medical condition, marital status, pregnancy, genetic information, gender, sexual orientation, parental status, gender identity, gender expression, veteran status, or any other status protected under federal, state, or local law.