Attributed to: Christian Belady, General Manager of Data
Center Services & Vijay Gill, Senior Director of
Engineering
When Microsoft delivered its first online service back in 1994,
the cloud was still a relatively undefined, evolving space.
Today, Microsoft's cloud supports more than 1 billion customers
and 20 million businesses in 76 countries and it's still
growing. To support the increasing array of enterprise-level
online services, Microsoft is investing heavily in a global cloud
infrastructure that delivers security, reliability,
scalability, efficiency, and performance. A key concern
among customers is that their services are persistently
available, and that their cloud provider has the technology,
policies, and processes in place to maximize uptime.
Disruptive events such as natural disasters, power grid outages,
and human error are a fact of life, so it is critical that our
cloud infrastructure is architected for resiliency, our processes
result in rapid problem resolution with minimal customer impact,
and we maintain a culture of service diligence and continuous
innovation to identify and eliminate systemic weaknesses.
Laser-sharp focus on operational excellence helps Microsoft
ensure reliability, availability, and security 24x7x365.
Operational excellence can be complicated, and our customers often
want to know what it encompasses.
An effective cloud infrastructure strategy pivots off of two
things: keeping our sites up, while driving costs down. We
architect a resilient cloud infrastructure with geo-redundant
back-up for recoverability and data integrity. The company is
making a heavy annual investment in over $9 billion in research and
development on our cloud, services, and software to ensure a strong
innovation roadmap.
The Microsoft cloud is also founded on robust security policies
and procedures that differentiate data (consumer vs. enterprise),
separating private company information from commercial use. We
maintain a fully developed Service Management framework called the
Microsoft Operations Framework which guides and formalizes our
compliance with international standards such as ISO20000 for
Service Management, ISO 27001 for Information Security, and BS
25999 for Business Continuity. In addition, we hold third party,
process-based attestations and certifications including SOC (SSAE
16/ISAE 3402), FISMA, Sarbanes-Oxley, and HIIPA/HITECH.

Our continuous
growth reflects judicious data center designs that add capacity
and reduce attendant costs to maximize compute capacity with the
fewest servers while maintaining workload isolation. We partner
with leading hardware manufacturers to right-size the thousands of
servers, routers, and equipment within them to balance the degree
of physical device fault tolerance with the application fault
tolerance.
We find the right balance of hardware and software resiliency to
provide the best availability for the lowest possible cost.
For Example, our network-one of the largest and most well-connected
in the world-includes multiple, physically diverse connections into
our data centers, and we maintain sufficient capacity to handle
large scale network interruptions without degradation of
performance.
Sophisticated service instrumentation and monitoring integrates
at the deepest levels with each component, giving us visibility
into the data center, network backbone, internet exchanges and
beyond, to help us spot, diagnose and manage the cause of any
disruption and resolve it quickly.
Technical troubleshooting expertise from our globally
distributed Microsoft Operations Center's team provides round the
clock staffing, failover capabilities and the resources needed to
triage, mitigate and escalate issues as they unfold in real time.
The way that we manage releases and changes through formalized
processes helps to remove the potential for human error, leverages
standardization and protects data in a vigilant manner.
One of the world's largest fiber backbones powers our data
centers, providing more than 3.5 terabits per second of capacity to
more than 1200 networks with robust availability. It provides the
ability to instantaneously reroute around internet failures and the
capacity to withstand significant network interruptions.
Finally, consideration of environmental issues and a relentless
focus on energy efficiency is a final component of Operational
Excellence. The Microsoft data center operations team continually
seeks to drive more efficient power use and cooling in our data
centers. Our latest modular data centers use about 50 % less energy
than those from three years ago. Our data center designs -
pioneering the use of fresh air cooling and ultra-efficient water
use in the latest modular facilities, use only 1 % of the water
used by traditional data centers in the industry. They also
use recyclable materials in construction and build in
sustainability measures that ensure we are constantly gaining in
efficiency and sustainability. (In fact, we were honored that
the United Nations Environment Programme relied on our modular data
center designs for the IT infrastructure at their new carbon
neutral headquarters complex.)
We also drive efficiency and sustainability in how we operate
our data centers. Microsoft's are among the most monitored and
measured data centers in the world, informing more efficient
operations and identifying areas for future research. At the
request of the US Environmental Protection Agency, we shared how we
do this at their industry stakeholders meeting on efficiency as an
example of how to do it right. Also, by eliminating unnecessary
components, using higher efficiency power supplies and voltage
converters, and binding the expandability of server platforms, we
achieve significant power savings. We look at specific measures
such as processor performance per-dollar, per-watt to determine
optimum tradeoffs in processor selection. We also operate our
servers at a wider temperature range and use free air cooling and
water economization to improve efficiencies. In San Antonio we have
been using recycled waste water since 2008 to cool the
facility.
With all these opportunities for efficiency, we were not
surprised that when we provided data to Accenture and WSP
Environmental for a study on the environmental impact of cloud
computing, they found that when organizations move their Microsoft
business applications to our cloud, there's a net energy savings
per user of at least 30 %. For small businesses the result can be
even more dramatic, with potential carbon savings of up to 90 % per
user.
As part of our commitment to driving not only greater
efficiency, but also reducing the impact of our data centers
for the long run, later this week we will be sharing some insights
into an exciting alternative- energy research project we have
been working on. While we have, and continue, to
purchase green energy at many of our locations around the world, we
have a fundamental belief that there is a potential to
redefine how we and the industry power our data centers.
A few years ago, we spoke about the concept of looking at data
as a form of energy and the concept of data plants. Last July
in a blog called
The Disappearing Data Center we expanded
the concept further. Through massive integration of
power plants and data centers, we felt that there could be
huge efficiency gains by eliminating the need for transmission
lines, substations, and transformers (as well as the
associated transmission losses) that we see in today's
power distribution ecosystem. With these data plants we
distribute data (an energy form) in a network (an optical
grid) providing the next generation of energy distribution. Looking
at it this way, we are essentially taking another step in the
evolution of refining the energy being distributed. The point is
that if we look at data as a form of energy, how does that change
what we are doing today? When we talk about the disappearing data
center, what we really mean is that it will disappear as we know it
through integration and drive unprecedented levels of efficiency
gain…. and this can only be done at the scale of the cloud. The
figure below shows this evolution taking sustainability to new
levels with other side benefits in terms of reliability and the
ability of using waste gases.

The evolution of data into energy.
We are committed to the long term sustainability of our industry
and believe this is where the industry should focus rather
than short term manipulation of carbon PPAs. We have
already invested millions in research in this area and have
filed many patents in this area, including one on the Data
Plant. We will provide more details later this week on these
innovations via our team blog here.
As we continue to experiment and share our findings, we hope
that we can accelerate ours, and our industry's continued move
towards increasing positive computational impact while reducing
operational impact.
We have come a long way since we built our first data center in
1989 (on our Redmond, WA campus) and launched our first cloud
service MSN in 1994. We continue to make positive strides
toward even greater reliability, availability, security, and
environmental sustainability every day. Our team is
passionately committed to not only improving our own cloud
infrastructure designs and the operational practices of our
facilities, but also sharing those with other stakeholders in the
data center industry.
As we work to educate customers and partners about our practices
for Operational Excellence the cloud infrastructure, we're making
additional resources available. I hope you will visit our customer priorities pages
for more information via our interactive videos, white papers, and
strategy briefs offering more insights on our best
practices.
//cb and vg
Read More >>