Predictive analytics software puts big squeeze on IT systems

Christine Parizo nailed it in this great assessment of the predictive analytics space:

Predictive analytics projects are no light breeze on IT infrastructures. Building and testing predictive models and then running them against large volumes of data can kick up a processing gale strong enough to overcome systems that lack the required power and capacity to effectively support the predictive analytics software.

It’s a big mistake on the part of IT managers to view predictive analytics initiatives and related data mining programs the same way they look at conventional business intelligence projects, according to analysts such as Rick Sherman, founder of consulting company Athena IT Solutions in Maynard, Mass. But that’s often the case, he said, leaving many analytics professionals with processing environments that aren’t sufficient for running predictive applications effectively.

To avoid that situation, companies should be aware of the possible need to add more hardware. In addition, the amount of data involved and the nature of predictive analytics efforts typically require dedicated storage space for data sets along with the ability to manipulate the information as needed, Sherman said.

An IT team has several options for providing that space, he added — for example, creating standalone data marts, setting up walled-off data sandboxes inside a data warehouse and even letting users download data into Excel spreadsheets. Those approaches remove analytics data from a controlled data warehousing environment and let analytics pros explore and work with the information more freely than they could otherwise.

Using Excel is generally the least appealing option for organizations because it takes data completely outside of the purview of IT managers, Sherman said. Data marts and sandboxes also require IT to give up control of data, he added, but they’re managed setups with security and backup protections. Sandboxes in particular create segregated islands where analytics professionals can play with data with autonomy — and without affecting regular data warehouse operations. “If they need more infrastructure, CPU power or memory, they’re isolated and it has less impact on other [processing jobs],” Sherman said.

If analytics applications are being run directly against information in a data warehouse, “the architecture of the warehouse can largely impact the potential satisfaction or dissatisfaction of users,” said John Lucker, head of the advanced analytics and modeling practice at New York-based Deloitte Consulting LLP. As a result, he endorsed the idea of moving data to external data marts for use by data scientists and other data analysts.

No matter how the IT department sets up the infrastructure for predictive analytics, Lucker said, effective data stewardship and governance processes need to be at the forefront of initiatives to help ensure that incoming data is accurate and consistent.

Scott Schlesinger, senior vice president and head of business information management consulting at Capgemini North America in New York, agreed that a well-planned data management strategy is a must for a successful predictive analytics program. That includes assessing data availability and data quality and cleaning up information as needed, he said, adding that organizations must be willing to push through “tough process changes” if doing so is required to get their data in shape for accurate analysis.

Nice views make analytics data understandable

Lucker thinks the use of data visualization tools also needs to be considered because business managers might better absorb predictive analytics findings if the data is presented in graph or chart form. “Sometimes analytics [professionals] err on the side of being overly quantitative,” he said. “Spreadsheets and tables are OK for some, but those methods might be awful for people who would prefer to see the data in a more visual, graphic way.”

And then, of course, there’s the predictive analytics software itself. A variety of tools are available, including credible open source options, Lucker said. He recommends that in evaluating and selecting predictive software, companies examine not only features, functionality and the long-term viability of vendors but also usability and the level of user training that will be required. “There’s a tremendous leaning at many companies to under-think the amount of investment required to become an expert in these tools,” he said. “People buy things and then tend to wonder why they’re sitting on the bench.”

One approach that has yet to be widely adopted is doing predictive analytics in the cloud, according to Sherman. Most companies are still using on-premises systems for their deployments, he said: “It’s not that they can’t use the cloud, but generally there isn’t enough processing power.”

Read the whole article at