- Client: Chamberlain
- Location: Oakbrook, IL
- Duration:long-term contract
- Rate: TBD
Notes from HM:
Perfect fit is 5+ year exp. Windows Server 2012/2016. Has worked Azure. Knows a few monitoring tools. Will work on call rotation a week at a time every 6 weeks, secondary backup rotation person every 6 weeks. Knows some simple load balancer pools management. Has admitted. Skills with AMQP servers. Strong with Offife tools and pretty good in Visio. Knows Atlassian JIRA and Confluence. Good communications. Fast learner. Open to taking a shift in work schedule to something like Sun to Thur or second shift weekdays 2pm to 10pm.
- BS/BA in Computer Science or related field
- 5 years’ experience in system operations, technical operations or DevOps role for websites and/or mobile applications
- More than 2 years’ experience working with a 24x7x365 service organization
- Experience working to support consumer-facing products and mobile phone applications with a strong engineering and/or IT service level component. Demonstrated success operating a complex system.
- Experience desired of working in inside a matrix organizations, including production support/analyst roles across call centers, marketing, engineering, product development and IT.
- Previous operations experience working on public cloud infrastructure.
- Familiar with both waterfall and agile SLDC methodologies.
- Communications skills for working closely with Project Managers, Product Owners, Technical Leads, Scrum Masters and individual contributors
- Desire some AWS or Azure project and resource administration experience with PaaS and/or IaaS.
- Willing to take ownership of service reliability and operations. Must be ready and available to respond 24x7x365 in cases of service incidents and disruptions
- Willing to “roll up the sleeves” and work with the team to continuously improve Quality Of Service
- Interaction and collaboration with outside 3rd-party IT, staffing and design partners who are providing solutions (software, staff, etc.) to projects
- Working knowledge of Internet protocols and web server software and communications, including HTTP, TCP, UDP, Web Sockets, Windows Server, IIS.
- Familiarity with security tools and best practices in defending against “bad actors”.
- Fluency with the Atlassian JIRA and Confluence tools for ticket, task, and knowledge management.
- Operator experience with monitoring tools such as Monitis, Pingdom, Nagios, DynaTrace, App Dynamics. Administrator level experience is desired.
- Able to write SQL queries to extract content from database tables. A plus if familiar with NoSQL systems. Will from time to time need to CRUD data records in copies of production databases.
- Understands load balancers and operational issue surround traffic management. Desire a demonstrated ability to manage load balancer pools.
- Highly organized, self-motivated, able to multi-task, able to get the message and needs of operations heard by the technologists, engineering and marketing/ business teams.
- Able to effectively analyze unexpected problems by locating patterns in the data and underlying causes.
- Attention to detail, quality, responsiveness and efficiency.
- Skilled with a CI/CD platforms and tools such as Octopus, Jenkins, Puppet, Chef, Docker.
- Can write Powershell scripts and take over script libraries with goal to keep them current/relevant.
- Lead the daily service operations activities with all Operations team members to ensure that the health and status of program and its integrated components are running and fulfilling end user requests
- Define, manage, and execute operational tests and processes to confirm system health or status
- Collaborate, consult, coordinate task level work with developers or scrum teams on matter of capacity expansion or new feature deployment. Ensure that server and system capacity projections are accurate and prevent any service degradation due to lack of resources
- Responsible for management of existing as well as development of new monitoring frameworks, monitoring dashboards and monitoring data history archive in support of Service Level Agreements and Quality of Service reporting. Deliver the “three nines” uptime by forming collaboration teams aligned to the SLA goal.
- Operates, manages, monitors the cloud hosted components/solutions which complement the corporate data center hosted solutions
- Participate, manage, lead incident response calls ensuring immediate mitigation and then ultimate resolution to the root causes of all service interruptions. Ensures that the operations team documentation and training on incident responses is keep current. Will be part of a rotational support team with 24×7 on call assignment every few weeks.
- Author necessary Root Cause Analysis (RCA) documents after service breaks. Collects RCA documents from other responsible parties for service breaks impacting platforms. Follow up on corrective actions items determined by RCA meetings
- Support the QA Team with defect/ticket research for pre-existing bugs or undesired system behavior
- Communicate with stakeholder and partner teams about planned and unplanned outage
- Perform daily / weekly/ monthly reporting duties as directed by supervisor. Conduct data extractions in support of QOS investigations, RCAs, business opportunities and user communications
- Responsible for all operational documents and communication materials. Develop new and maintain existing process documents
- Participates in meetings and reviews throughout the project meetings with our client’s staff to discuss the operational dimensions and requirements of new releases of products
- Proactively and continuously educates non-technical staff on relevant operational best-practices
- Will direct others in work activities for small task based projects. Will follow guidance from Development team architects on new functionality/features being deployed; ensures relevant server capacity is available for all scheduled feature launches.
- Protect company reputation by keeping information confidential.
- Maintain professional and technical knowledge by attending educational workshops, professional publications, establishing personal networks, and participating in professional societies.
- Contribute to the team effort by accomplishing related results and participating on projects as needed.