protips

/protips

Dear aspiring speaker

Aprill-Allen-smconference-2016-280

Submitting a proposal to speak at Service Management Conference is a chance to open a sustained dialogue with your peers and expand your network. In this post, itSMF Australia’s National Events Director Aprill Allen shares her tips for those thinking of submitting a proposal.

I’m glad you’re interested in sharing your story with our delegates for the 20th annual national itSMF Conference. I want you to have the best possible chance at joining us in Melbourne this year, so here are some tips to help your submission be successful.

Our members love case studies. Case studies consistently rate highly with our members and it’s easy to understand why. They’re in ready-made story-telling format, which makes them easy to relate to, easy to understand, and easy to remember. Whether a case study demonstrates your success or ultimate failure, it should start with a background setting of who and where, follow up with what your big hairy challenge was, how you approached solving it, what the outcome was, and why it went the way it did — your lessons learned. This is the valuable part that helps each of us grow from your experience.

Make sure your topic title is interesting and consistent with your session description. This tip almost speaks for itself, but it’s not uncommon to be too clever with a topic title and have your audience not make the connection. They may not understand the session’s relevance to them and not attend, or worse still, they may rate you poorly because they expected something different.

Pitch to the right level. We have delegates covering the spectrum from beginner to advanced. Make sure your content is pitched consistently with the audience level you’ve selected.

Consider the theme when you develop your submission. This year, our theme is Service Management 2.0. Our workplaces and consumer expectations are already changing in a multitude of ways. What do we need to do differently to be a step ahead? How will our service management toolbelt evolve? If your expertise is outside the ITSM domain, what are the skills you know our service management practitioners and leaders need to be successful? What are the stories they need to hear, or learn to tell?

The Conference is the place to push boundaries with new material. The selection process tends to reward experienced presenters, which is why we try to give new speakers exposure at our state seminars and ask for a referral. For our experienced presenters, already popular at our state seminars, the national Conference is an opportunity to share a new angle or a new story.

Reviewer feedback will be your first test of the clarity and impact of your submission. I’m no stranger to how it feels when something so clear in my head isn’t coming through easily to whomever I’m sharing it with. It’s beyond frustrating. Work through those awkward misunderstandings, if they come up, because when the light bulb goes on, it’s rewarding for all involved.  

Reviewers will be looking for all these points during the selection process, and how well you address them will influence your chance of selection. Good luck, and I hope to see your presentation on stage!

Find out more and submit a speaker proposal here.

By |2018-03-19T16:23:17+00:00March 3rd, 2017|ITSM, protips, Service Management 2017|

Love DevOps? Wait ’till you meet SRE – with guest blogger Patrick Hill

headshots-large-PH

 

 

 

 

Site Reliability Engineering may be the most important acronym you’ve never heard of – here’s why.

You may have heard of a little company called Google. They invent cool stuff like driverless cars and elevators into outer space. Oh: and they develop massively successful applications like Gmail, Google Docs, and Google Maps. It’s safe to say they know a thing or two about successful application development, right?

They’re also the pioneers behind a growing movement called Site Reliability Engineering (SRE). SRE effectively ends the age-old battles between Development and Operations. It encourages product reliability, accountability, and innovation – minus the hallway drama you’ve come to expect in what can feel like Software Development High School.

How? Let’s look at the basics.

What in the world is SRE?

Google’s mastermind behind SRE, Ben Treynor, still hasn’t published a single-sentence definition, but describes site reliability as “what happens when a software engineer is tasked with what used to be called operations.”

The underlying problem goes like this: Dev teams want to release awesome new features to the masses, and see them take off in a big way. Ops teams want to make sure those features don’t break things. Historically, that’s caused a big power struggle, with Ops trying to put the brakes on as many releases as possible, and Dev looking for clever new ways to sneak around the processes that hold them back. (Sounds familiar, I’d wager.)

SRE removes the conjecture and debate over what can be launched and when. It introduces a mathematical formula for green- or red-lighting launches, and dedicates a team of people with Ops skills (appropriately called Service Reliability Engineers, or SRE’s) to continuously oversee the reliability of the product. As Google’s own SRE Andrew Widdowson describes it, “Our work is like being a part of the world’s most intense pit crew. We change the tires of a race car as it’s going 100mph.”

Doesn’t sound revolutionary yet? Much of the magic is in how it works. Here are some of the core principles – which also happen to be some of the biggest departures from traditional IT operations.

First, new launches are green-lighted based on current product performance.

Most applications don’t achieve 100% uptime. So for each service, the SRE team sets a service-level agreement (SLA) that defines how reliable the system needs to be to end-users. If the team agrees on a 99.9% SLA, that gives them an error budget of 0.1%. An error budget is exactly as it’s named: it’s the maximum allowable threshold for errors and outages.

ProTip: You can easily convert SLAs into “minutes of downtime” with this cool uptime cheat sheet.

Here’s the clincher: The development team can “spend” this error budget in any way they like. If the product is currently running flawlessly, with few or no errors, they can launch whatever they want, whenever they want. Conversely, if they have met or exceeded the error budget and are operating at or below the defined SLA, all launches are frozen until they reduce the number of errors to a level that allows the launch to proceed.

The genius? Both the SREs and developers have a strong incentive to work together to minimize the number of errors.

SREs can code, too

In the old model, you throw people at a reliability problem and keep pushing (sometimes for a year or more) until the problem either goes away or blows up in your face.

Not so in SRE. Both the development and SRE teams share a single staffing pool, so for every SRE that is hired, one less developer headcount is available (and vice versa). This ends the never-ending headcount battle between Dev and Ops, and creates a self-policing system where developers get rewarded with more teammates for writing better performing code (i.e., code that needs less support from fewer SREs).

SREsTalking

 

SRE teams are actually staffed entirely with rock-star developer/sys-admin hybrids who not only know how to find problems, but fix them, too. They interface easily with the development team, and as code quality improves, are often moved to the development team if fewer SRE’s are needed on a project.

In fact, one of the core principles mandates that SRE’s can only spend 50% of their time on Ops work. As much of their time as possible should be spent writing code and building systems to improve performance and operational efficiency.

Developers get their hands dirty, too

At Google, Ben Treynor had to fight for this clause, and he’s glad he did. In fact, in his great keynote on SRE at SREcon14 he emphasizes that getting this commitment from your executives before you launch SRE should be mandatory.

Basically, the development team handles 5% of all operations workload (handling tickets, providing on-call support, etc.). This allows them to stay closely connected to their product, see how it is performing, and make better coding and release decisions.

In addition, any time the operations load exceeds the capacity of the SRE team, the overflow always gets assigned to the developers. When the system is working well, the developers begin to self-regulate here as well, writing strong code and launching carefully to prevent future issues.

SRE’s are free agents (and can be pulled, if needed)

To make sure teams stay healthy and happy, Treynor recommends allowing SRE’s to move to other projects as they desire, or even move to a different organization. SRE encourages highly motivated, dedicated, and effective teamwork – so no team member should be held back from pursuing his or her own personal objectives.

If an entire team of SREs and developers simply can’t get along and are creating more trouble than reliable code, there’s a final drastic measure you can take: Pull the entire SRE team off of the project, and assign all of the operations workload directly to the development team. Treynor has only done this a couple times in his entire career, and the threat is usually enough to bring both teams around to a more positive working relationship.

There’s quite a bit more to SRE than I can cover in once article – like how SRE prevents production incidents, how on-call support teams are staffed and the rules they follow on each shift, etc.

Our take

IT is full of buzzwords and trends, to be sure. One minute it’s cloud, the next it’s DevOps or customer experience or gamification. SRE is in a strong position to become much more than that, particularly since it is far more about the people and process than the technology that underlies them. While technology certainly can (and likely will) adapt to the concept as it matures and more teams adopt it, you don’t need new tools to align your development and operations organizations around the principles of Site Reliability Engineering.

In future articles, we’ll look at just that: practical steps for taking a step towards SRE, and the role technology can play.

 

This article was originally published on the Atlassian website.


Patrick Hill, Site Reliability Engineer, has been with Atlassian a while now, and recently transfered from Sydney to our Austin office. (G’day, y’all!) In my free time, I enjoy taking my beard from “distinguished professor” to “lumberjack” and back again. Find me on Twitter! @topofthehill

Patrick’s colleagues Sam Jebeile and Nick Wright will be discussing SRE in depth at Service Management 2015.