Big Data

Multi-Cloud Storage Management Mistakes

James D'Arezzo CEO, Condusiv Technologies

August 22, 2019

With a growing tsunami of data, an increasing need for sophisticated data analytics and the ever-present requirement of maximum system performance, it’s easy to understand why organizations move to a multi-cloud configuration.

However, it’s easy to end up paying far more than necessary for the storage component by buying more than necessary and overconsuming storage resources.

If performance lags, IT pros typically turn to more expensive storage solutions. Of course, cloud providers are happy to charge for these, but it’s a perfectly avoidable expense.

First Mistake: Lack of Planning

Lack of system planning, along with workload tuning, is the biggest mistake IT pros can make when considering multi-cloud storage.

It is essential to know your organization’s true storage needs. When planning a roll-out of a cloud environment, it is critical for customers with large data storage needs to identify the layers of storage and performance requirements. Archival data and third-tier information clearly don’t need the same performance as a customer-facing transaction system.

Second Mistake: Overbuying

Cloud customers will buy (and then consume) more storage than they need because of unnecessary and underperforming I/O traffic. As more VMs are added to virtual environments in the cloud, shared resources cause unnecessary I/O traffic that result in performance issues, resource conflicts and wasted disk space. To compound the issue, the Windows file system used in most environments breaks data into small, fractured, random I/O that adds at least 30 percent more I/O overhead. This also steals resources and robs performance.

I/O is a big issue when it comes to maximizing throughput. In physical and virtual data centers or cloud-based environments, performance is regulated by the interaction of the three basic layers of computing: compute (CPU), network and storage. Performance is dependent on the I/O traversing these layers so that application results reach the end user. Since these three layers often are widely separated in a SAN, cloud or a faraway data center, the problem worsens, and I/O completion times extend. Not only that, if the amount of I/O going through the system is too great, it will take longer to complete a cycle, just as it takes a commuter longer to reach work during rush hour.

A Primary Cause of I/O Issues

A little-known fact is that performance is robbed by small, fractured, random I/O generated by the Windows operating system (any Windows operating system, including Windows 10 or Windows Server 2019). Windows is an amazing solution used by some 80 percent of all systems on the planet.

But as the storage layer is logically separated from the compute layer and more systems are virtualized, Windows handles I/O logically rather than physically–meaning it breaks down reads and writes to the lowest common denominator. This results in tiny, fractured, random I/O that creates a “noisy” environment. Adding virtualized systems into the mix can cause the I/O blender effect.

Solutions

There were probably multiple reasons you chose or are thinking about a multi-cloud strategy. In addition to performance and flexibility, you may be attempting to avoid vendor lock-in, meet data sovereignty requirements or create data availability zones. Even redundancy may be driving your decisions. While these are worthy reasons, lack of planning and overbuying are the two most common multi-cloud management mistakes to avoid.

Good planning is the best way to avoid these mistakes. There also are several excellent cloud orchestration solutions offered by large and small vendors to ensure resources are monitored and balanced.

Additionally, storage I/O reduction technologies can reduce the storage I/O workload by 30 to 50 percent or more. These allow the current storage solution to perform better for the applications running in the cloud without the need to consume storage hardware from a more expensive storage tier. This can save cloud users a lot of money while improving application performance.

Database

James D'Arezzo

CEO, Condusiv Technologies

opens a new window

James D'Arezzo is the CEO of Condusiv Technologies. He has had a long and distinguished career in high technology. First serving on the IBM management team that introduced the IBM Personal Computer in the 1980s, he then joined start-up Compaq Computer as an original corporate officer as VP Corporate Marketing and later VP International Marketing. Seeing the technology trend toward networking, James joined Banyan Systems in the early 1990s as VP Marketing and helped that global networking software grow rapidly and eventually go public on NASDAQ. He then moved on to computer-aided design software Autodesk as VP Marketing and multiple Division GM for data management, data publishing and geographic information systems. D'Arezzo later served as President and COO for Radiant Logic, Inc., James holds a BA from Johns Hopkins University and an MBA from Fordham University.