Senior Systems Reliability Engineer
Houghton Mifflin Harcourt
Evanston, IL

About $98,000 - $140,000 a year

HMH Software Engineering – Software Engineering

HMH Software Engineering provides cutting edge, individualized learning experiences to millions of students across the United States. We are as driven by this mission as we are by continuously improving ourselves and the way we work. Our offices are high energy, collaborative bee hives of activity where work is centered on small, autonomous teams that build great software. We trust each other, hold ourselves and our teammates accountable for results, and improve student outcomes with each release.

At HMH we constantly experiment with new approaches and novel ways of solving problems. We often succeed and sometimes stumble – either way we learn and move forward with more confidence than we had the day before. We are as passionate about new technologies and engineering craftsmanship as we are about transforming the EdTech industry itself.

We’re not just looking for hands on a keyboard to pound out code, we’re looking for talented teammates and colleagues who contribute as much as they receive and thrive working with us.
If this sounds like you let’s talk.

The Opportunity - Systems Reliability Engineer - Technical Services Team

Who We Are

The Bedrock Technical Services Team operate, develop, maintain and support the newest services delivery platform for Houghton Mifflin Harcourt. We focus on reliability, availability, performance and scalability of the newest efforts in the education space.

Deploy, upgrade, operate/maintain, and scale our suite of products and services
Closely collaborate with Software Engineers to create highly operable and maintainable products
Manage the underlying infrastructure in collaboration with IT and Engineering
Engage in and improve the whole lifecycle of services - from inception and design, through deployment, operation and refinement
Practice sustainable incident response and blameless postmortem
Provide end-user support to software engineers and stakeholders for products and services
Understand, simplify, and automate processes to improve systems and reduce toil
Help customers get work done by providing support for bugs and administrative tasks
Improve logging and monitoring tools used for our application ecosystem
Coordinate, execute, and improve software development life-cycle processes
Infrequent on-call rotation for off hours support of mission-critical issues

3+ years of Site Reliability or DevOps type experience
3+ years of experience with Linux operating systems
Understanding of Puppet, Ansible, or other automation frameworks
Automation skills in shell bash, Python, and/or other languages
Experience with source code and version control tools such as Subversion or Git

5+ years of Systems Administration, Site Reliability Engineering, or DevOps experience
3+ years of experience with Python and Python-based development frameworks
Strong understanding of Docker, Vagrant, and Kubernetes, or similar technologies
Strong understanding of virtualization and hypervisor technologies
Practical knowledge and demonstration of industry security practices and procedures
Experience with automatically managing dozens or hundreds of servers
Focus on performance bottlenecks and performance improvement techniques
Experience with workflow and issue management tools such as JIRA
Experience working and strong desire to work in a lean/agile environment
Strong networking knowledge of TCP/IP
Experience work with AWS, GCP or Azure environment(s)
Must be comfortable working with mission critical and sensitive systems, with a sense of urgency appropriate to the responsibilities
Must be available to participate in intermittent scheduled on call periods and answer pages and support issues in off hours
Excellent communications skills with the ability to communicate with stakeholders, peers, management etc. in both formal and informal situations
2+ year experience with a software language (Python, Java, Go)
1+ year experience with continual delivery tools (Jenkins, Bamboo, Concourse CI)
1+ year experience with telemetry tracking, monitoring, and alerting
1+ year experience supporting customer requests and queue management
Prior on-call experience supporting production critical software
Proven ability to work autonomously
Proven ability to collaborate with a team
Understanding of software delivery life cycle

Physical Requirements

Might be in a stationary position for a considerable time (sitting and/or standing).

The person in this position needs to move about inside the office to access file cabinets, office machinery, etc.

Constantly operates a computer and other office productivity machinery, such as a calculator, copy machine, and computer printer.

Must be able to collaborate with colleagues via face to face, conference calls, and online meetings.

Houghton Mifflin Harcourt (NASDAQ:HMHC) is a global learning company dedicated to changing people’s lives by fostering passionate, curious learners. As a leading provider of pre-K–12 education content, services, and cutting-edge technology solutions across a variety of media, HMH enables learning in a changing landscape. HMH is uniquely positioned to create engaging and effective educational content and experiences from early childhood to beyond the classroom. HMH serves more than 50 million students in over 150 countries worldwide, while its award-winning children's books, novels, non-fiction, and reference titles are enjoyed by readers throughout the world.

For more information, visit

Houghton Mifflin Harcourt is an equal employment opportunity employer and participates in E-Verify. All qualified applicants will receive consideration for employment and will not be discriminated against on the basis of gender, race/ethnicity, gender identity, sexual orientation, protected veteran status, disability, or other protected group status.

Nearest Major Market: Chicago
Job Segment: Publishing, Education