There’s something about methods of measurement which simply seem hard to change or evolve. The concept of horsepower has existed relatively unchanged since the mid-18th century. So too with miles per hour used to measure stage coach speed in the early 18th century. A yard (as a unit of measure) debuted in the 12th century during the reign of King Henry I of England and the pound (as a unit of mass) can trace its origins as far back as 775AD. It’s no wonder then that the SLA (service level agreement) has been slow to change and evolve, despite its relative youth. The SLA has tremendous importance as a measuring stick for robust service delivery. Coupled with the seemingly endless list of challenges and issues with the SLA today, it suggests that change is obligatory.
Lending even a more significant case for change is the tidal wave shift from on-prem to cloud-based services. This exponential increase of reliance and dependence on SaaS providers amplifies the need to rethink the SLA. And yet, the SLA concept seems to often still be something of an enigma to many organizations. When should it be used? Why is it needed? How much detail to include in its definition? How should it be accurately measured? And, of course, what are the challenges with the SLA, and is it time for a change? Let’s examine the SLA through a UCaaS (unified communications as a service) lens. It’s an attempt to determine if it can buck the trend of its long-reigning predecessors.
Service Level Agreement (SLA) Raison D’etre
We need to think of the SLA as an integral part of any SaaS vendor contract. Any contract without an associated SLA is open to deliberate or inadvertent misinterpretation. The SLA identifies which party is responsible for reporting faults or paying fees. It also includes penalties imposed if providers can’t meet agreed-upon targets. Organizations must address SLAs early in the evaluation process with a UCaaS provider. Not only to compare contracts from different vendors but to ensure to meet all their communications requirements. Businesses must centralize information on all the contracted services and their agreed-upon expected reliability to protect both the customer and the vendor.
Connect with a Consultant: Take the first steps towards securing your workplace collaboration platform to give you peace of mind with information security in tact.
What are Service Level Agreements (SLAs)?
Traditional SLAs for UC services promise reliability, availability, and serviceability. Reliability means the technology/service shouldn’t break. Availability means the technology/service should be available for use all the time. Serviceability means fixing any technology/service fails instantly. Underlying this model is the assumption that uptime is business-critical to the customer. A UC oriented SLA will also typically have a technical definition in mean time between failures (MTBF), mean time to repair, or mean time to recovery (MTTR). Other parameters should include responsibility for various data rates, throughput, jitter, incident response time, or similar measurable details. In addition to the above, a well-rounded UC-centric SLA will also be easily measurable. It will account for both usual and unusual exceptions and cover the following categories and use cases:
- Privacy and security
- Length of time to provide services
- Implementing moves adds, changes and, deletions
- Updating 911 routing databases
One critical misconception regarding a UC SLA definition is that it must address and guarantee at least four nines availability. The reality is that providers, unfortunately, aren’t required to address performance and remedies for non-performance in their SLAs. Some vendors do guarantee four or five nines availability, but most only offer best-effort guarantees for uptime and performance.
What to measure in a Service Level Agreement (SLA)?
Here’s where things begin to get a bit more complicated and trickier. There exists a major misconception that having a UC cloud SLA absolves an organization from any monitoring or management duties. Customers need to face reality on two fronts:
- The vast majority of SLAs are written to put the ‘burden of proof’ on the customer. In other words, the customer must proactively call out when the UCaaS provider did not meet the SLA terms. Mainly if the customer hopes for compensation.
- Usually, all UC vendors provide some sort of Admin Console and usage reports that include uptime, performance, and outages. However, these reports typically provide insufficient visibility into the provider’s service delivery. Further, the reports or consoles do not accurately reflect end-user performance when considering extraneous variables like network performance. This makes it difficult to ascertain where fault resides. It also makes it challenging for the customer to identify and resolve performance problems. It makes it even more challenging to prove that the SLA was not hit and to request credits.
Luckily for customers, there are a set of 3rd party specialty tools and services in the UC market space, such as PowerSuite and PowerSuite Cloud Managed Services. These tools and services can provide an expert source of ‘independent’ monitoring and management insights beyond UCaaS provider reports and fill the platform vendors’ gap. According to Nemertes Research, nearly 60% of organizations manage UCaaS using some sort of 3rd party specialty tool – but that still leaves a large number of overall organizations that lack the all-important additional oversight.
Challenges of Service Level Agreements (SLAs)
Beyond the monitoring and reporting challenge already called out above, let’s examine the other areas where today’s definition and system of SLAs begins to fall apart.
First and foremost is the human factor issue—employee experience. The vendor dashboard may read as “all green,” but does that mean employees are satisfied with the IT experience? Or are SLA metrics hiding more significant issues? This UX oversight can create a high risk of making a decision that checks the box regarding technology while missing (or, in some cases worsening) the end-user satisfaction needs. It’s also important to point out that UX is not a single issue/need. In any given organization, there can be ten’s or even hundreds of different user personas – all with a different set of perceptions and requirements. As a result, we must evaluate the user experience in the context of each group’s work activities.
Beyond the UX issue, here’s a listing of the additional top problems with today’s SLA metric:
The End-to-end solution dictates the user experience, but there are no UC SaaS providers that offer end-to-end service-level guarantees. This means that IT must manage each service separately.
SaaS vendors lack a standard definition regarding how availability is defined. Some state availability as the user’s ability to access and work on the application, while others may simply say system uptime. Ideally, availability should be calculated based on the maximum total unplanned downtime averaged over no longer than a month. Two related issues here are 1. Usually, any continuous outage of less than a certain period, such as five minutes or first 0.05% downtime, is not included; and 2. there is a tendency to often average availability over an excessive period, such as a calendar quarter or a year.
Document vs. Real World
Often, what sounds good on paper ends up being useless in practice. SLAs can contain ambiguous wording that lets providers off the hook. The loss of revenue and productivity due to service failure far outweighs any minor compensation, even after acknowledging provider culpability.
The Microsoft Teams SLA “Case Study”
To explore some of the UCaaS SLA challenges, let’s look at Microsoft Teams as a case study of sorts. Microsoft’s SLA for Teams presents the following hurdles for anyone looking to set their expectations carefully:
- Discovery: Although the SLA is publicly published, it’s quite hard to find. Even a Google search doesn’t easily yield the result. One must know/understand that we classify Teams as part of Microsoft’s “Online Services Once we establish this, anyone can find and read an easy to understand 72 page SLA document.
- Definition: To its credit, Microsoft breaks out three different parts of Teams (to allow for granular definition) and provides each with a separate SLA: core chat, conferencing and calling, and voice quality. All 3 offer the same 99.9% (note only 3 9’s) uptime. However, in the case of voice quality, Microsoft does NOT include calls placed with a non-certified device or put through a wireless connection, or placed over a network not managed by Microsoft.
- Phone Detail Coverage: There is no SLA for a generic Phone System and especially not for scenarios where we deploy direct routing.
- Availability: Microsoft uptime reporting is only focused on Office365 and ‘doesn’t break out Teams service uptime. As such, it’s nearly impossible to derive actual values for just Teams.
Need Help? Don’t go at it alone! If your organization needs help in deploying workstream collaboration tools, Unify Square can help. Our expert consultants can work with you on platform selection and deployment plans to ensure a successful roll-out. Contact us today!
The Case for an SLA to XLA Change
All of this leads to the discussion of whether the UCaaS market can do with ‘SLAs what history could not accomplish with other measurement forms- change. Is it time to switch from legacy uptime SLAs (which have become an expectation, not a differentiator) to experience level agreements (XLAs)? If Teams is ‘online,’ and the user can sign-in, but is unable to make a call, is the SLA working and reflecting true user sentiment?
The kernel of the XLA idea is to move away from purely operational performance metrics and to measure instead of the overall digital experience that end-users encounter as they conduct their work.
Such a change is obviously of paramount importance to organizations for the following reasons:
- Changing User Profiles: An increasing number of Gen-Y and Gen-Z workers in the workplace will require new and expanded IT responsiveness
- Changing Workplace Norms: If the COVID-19 pandemic creates a more permanent distributed work environment, there will be a need to bridge the gap between the digital experience for external customers/consumers and its digital experience for internal IT users.
- Recruiting: Positive digital workplace experience helps to attract and retain top talent
- Productivity: Helping users become more efficient by eliminating high-friction workplace issues that may slow the completion of tasks or projects
Kickstarting the XLA Shift
Many would argue that this XLA change has not yet occurred because change is hard and scary. Others say that it hasn’t occurred because it would leave the vendors open to uncertain financial penalties.
Ultimately the real reason for the delay, though, is the technology itself. UCaaS vendors are a set of digital workplace technologies driven by Murphy’s law and supported by AI-based analytics. Yet, most of them (and especially managed service providers) don’t possess the analytics capabilities or tools required to measure the quality of user experience correctly. A robust XLA system will only be achievable through detailed and systematic measurement of a far greater number of touchpoints than traditional SLA-based agreements.
In addition to the analytics imperative, the other essential requirements to realize a robust XLA system is to ensure that UCaaS deals either include a managed services component. Successful UX-based agreements ensure that the implementation of new technologies is co-developed with the customer in such a way that embeds managed services throughout the service life cycle. Such solutions allow for quicker support, at a fraction of the cost of keeping full-time resources on-site. With the expected ongoing change in any service, managed services’ life cycle support is essential. In other words, the perfect XLA will employ a solid mix of both high tech (tools-based analytics) and high touch (managed services).
Top 5 XLA Imperatives
We need to accept the case that XLAs are the new frontier. Given that, what are the top imperatives that should be required when a vendor crafts the XLA? Or when the customer decides which vendor’s XLA is best for its business? Here’s the Unify Square top 5 that we’re using as we begin to build our own set of XLA’s:
Ensure that there are no ambiguous or fuzzy measurements (e.g., leveraging the analytics behind Teams or Zoom call rating metrics).
Measure from the end-user perspective. For example, instead of a metric based on max I/O throughput, an XLA would highlight metrics-based around an effective meeting experience. This would include some mix of the following:
- working (and good quality) audio,
- working (and good quality) video, and,
- useful and timely application sharing
The starting assumption must be that 1) Companies need to put the XLAs through a trial run before enforcing them, and 2) No two companies are the same. As an example, a voice quality XLA should start by first measuring Poor Call Performance (PCP) for some period, and then picking the second-lowest PCP data point as the target. In turn, the second-highest PCP measurement becomes the threshold above which the vendor pays service penalties.
Continuous service improvement
Instead of service reviews and penalties, the model becomes uninterrupted 24×7 service and constant improvement. The analytics mentioned above show how employees interact with UCaaS services and how they feel when they encounter specific issues that drive ongoing development to boost overall satisfaction. For example, if the initial XLA for voice quality is 1.5%, but the vendor continuously hits 1%, that should become the new target.
The “put your money where your mouth is” approach should become a standard; however, we must share the risk as a partnership. Certainly, penalties (to the UCaaS vendor) should be in place if services do not meet expectations/agreements. However, there should be a built-in reward mechanism for the provider, as well. Creating a financial ‘earn back’ incentive for the provider based on meaningful metrics for your ‘organization’s business will keep the UX transformation initiative on the right track.
The XLA Catalyst for the UCaaS Provider
While it may be clear that a new XLA is good for the goose (here represented by the customer), why would it also be good and suggested by the gander (the vendor)? It boils down to 5 key reasons:
- Service Desk Costs: User experience visibility helps agents identify issues proactively before an end-user file a ticket. Vendor operations staff can investigate the UX trend, identify top impacts, and remediate them before an end-user notice a decline in performance.
- Root Cause Visibility: By visualizing the analytics that underscores root cause, providers avoid damaging their reputation for problems they didn’t cause. Further, UCaaS providers can pinpoint the source of a problem. They can either resolve it if they are responsible or contact the responsible party to assist in the performance resolution.
- User Segmentation/Personas: The employee experience insights used to create the XLA can help providers understand customer needs even better than customers themselves. Providers can start segmenting to deliver assets and services based on data that shows what users currently have, how much they’re using it, and their experience.
- Price Increase Potential: At least for now, customers are willing to pay more for better service. When the day-to-day user experience of a specific UCaaS vendor shines over and above other similar services, such cutting-edge cloud services can command a premium.
- Customer Loyalty: XLAs offer a high potential to increase customer loyalty and customer retention. A UCaaS provider who assists the customer in the creation of high-performing digital workplaces can easily prove their value by reporting on meaningful metrics that go far beyond the old-school uptime alone.