Seven Steps for Executing a Successful Data Science Strategy by David Stodder


Power up your staff’s skills and boost your business

or Call us on +91 79 403 270 00


Seven Steps for Executing a Successful Data Science Strategy by David Stodder

17 Oct 2018 | SAS Data Science

Seven Steps for Executing a Successful Data Science Strategy by David Stodder

1. IDENTIFY YOUR ORGANIZATION’S KEY BUSINESS DRIVERS FOR DATA SCIENCE

Data science may not be for everyone. Before embarking on a data science project, the first question to ask is a simple one: Do we need data science? Users may appear content with spreadsheets, business intelligence (BI) applications, and the selection of structured data available through data warehouses or other IT-managed repositories. Existing reports and dashboards may seem sufficient. From this perspective, investing in data science and technology to expand the reach of analytics into more data types, including semi-structured and unstructured, may appear unjustified. To evaluate whether it is worth engaging in data science, organizations need to look at the value it could bring beyond what it already realizes from traditional BI, analytics, and data warehousing. The place to begin such an evaluation is the potential business drivers: What business value could be gained by developing a data science strategy? What are the questions the organization needs to solve to be more competitive, effective, and proactive? How well does the organization understand—and know how to respond to—the interplay of factors that affect customer behavior, the success of its website, or the impact of key trends? Often, such analysis will reveal knowledge gaps the organization has been unable to fill with its current BI and data warehousing systems. For this reason, one of the most important qualities to seek when selecting personnel for a data science effort is knowledge of and curiosity about the business. Data scientists often come into an enterprise possessing exceptional technical and scientific skills. However, it is critical that they also develop the business domain expertise to uncover questions the organization needs to ask using analytics, and how to make the resulting data insights actionable. At this stage, organizations should identify where data science could contribute most to realizing business objectives. Some classic areas include achieving greater personalization and computational efficiency in marketing and advertising; monitoring social media; modeling attribution to determine what drives purchasing; establishing a dynamic pricing strategy across multiple channels; uncovering fraudulent activity; and autonomic analysis of important documents or images such as call center logs or checks.

2. CREATE AN EFFECTIVE TEAM FOR ACHIEVING DATA SCIENCE GOALS

“It’s like chasing unicorns” is a phrase often used to describe the difficult task of finding and keeping those rare individuals with the experience and ability to perform all that is required of a data scientist. In this exclusive group, many have a Ph.D. and a good number come from diverse, non-computer-science backgrounds. The pioneers of data science—“half hackers and half scientists,” as one person put it—often took a do-it-yourself approach through hands-on implementation of Hadoop and other open source technologies to store, access, and analyze massive and varied sources of big data. Although firms have benefited from their innovation, the artisan approach has left them vulnerable if and when their data scientists are lured away by competitors. Rather than focus on finding one or a few individuals who seem to be able to do it all, a wiser course is to develop a stable team that brings together the talents of multiple experts. As discussed in the previous step, the team’s members must understand business drivers and not lose sight of the goal of delivering actionable business value. Each member of the team should also have enthusiasm, curiosity, and creative energy for working with business leadership on data and analytics projects. Depending on the project, the team will need personnel with a combination of skills that include expertise in the business domain (for example, customer engagement or marketing), business analytics, statistics, data mining, machine learning, data and information retrieval, programming, prototyping, and visualization. Organizations should assemble a team that includes individuals with communication skills, not just technical acumen. Although it is valuable to look externally for data scientists and leadership such as chief data officers, taking a team approach allows organizations to look internally. Many organizations already have personnel who could join a data science team. Indeed, TDWI Research finds that the majority of organizations plan to train internal personnel to handle data science projects. Personnel could include business analysts, statisticians, software developers, data analysts, and other data professionals. In this step, organizations should bring business and IT leadership together to develop a strategy for creating effective and sustainable data science teams. Their plan should include training and incentives to attract internal personnel.

3. EMPHASIZE COMMUNICATION SKILLS TO REALIZE DATA SCIENCE’S VALUE

Organizations that use data science successfully almost universally point to communication as a key ingredient to their success. Insights provided by analytics are of little value unless the data science team articulates what the findings say and why they are significant to business goals. Often this is not easy, especially if the presentation of the findings calls into question executives’ “gut feel” assumptions about business strategy, strays from tightly controlled modes of BI reporting and analysis, or suggests that established processes are ineffective or outdated. Data science often points to the need for change—and change can be difficult. Communication is also vital to improving collaboration in a data science project. Often, along with data scientists, key players (such as statisticians, business analysts, data analysts, and developers) are scattered in silos across the organization, or business and data analysts may work in a separate department than the business stakeholders, who should also be part of the data science effort. Important new perspectives can come if data science teams are able to work across divisions or silos to gain a more global view. For example, to identify which actions are most influential to the buying behavior of an important cluster of customers, it is valuable if data science teams can examine data from a number of sources that might be managed in different divisional silos such as e-commerce, brick-and-mortar stores, contact centers, and field service offices. This “big data” has never fit easily into a data warehouse, much less a spreadsheet. The data science team could make a great contribution just by pulling together a global, holistic view of this scattered data. Working across the organization is also important when the goal of data science is to optimize a process by developing algorithms that will automate decisions. Communication is essential; the team must be aware of how optimization will impact dependent processes, including how data is collected and analyzed. Without good communication, optimization could have unintended consequences. Communication by and among data science teams is essential to building a data-driven analytics culture. In this step, organizations should emphasize the value of communication and make it a priority as they evaluate candidates for data science teams.

4. EXPAND THE IMPACT OF DATA SCIENCE THROUGH VISUALIZATION AND STORYTELLING

Data science fits into a larger objective of creating a datadriven “analytics culture” that is energized by a shared desire to improve decision making at all levels, from executives to frontline personnel. The key goal is to supplant uninformed, emotional decision making based on inaccurate theories with decision processes that are supported by empirical evidence, testing of hypotheses, and impactful data analysis. Although inspiration will always be vital, companies with healthy analytics cultures accept the notion that assumptions should be questioned by looking closely at the data. Data science thrives in an analytics culture. However, not all personnel in an organization are going to be part of data science teams, nor should they be. To bring more users into the analytics culture, organizations should explore technologies that can support the “democratization” of BI, analytics, and data discovery. These products are increasingly able to address users’ self-service demands for data access and interaction without IT hand-holding. The tools go beyond simple spreadsheets and canned reporting to deliver different perspectives on metrics, help users uncover trends, and enable them to personalize dashboards. Data visualization is an essential technology for data science and most self-service BI, analytics, and data discovery use cases. Across organizations, users’ visualization requirements can be diverse; some need simple interfaces that emphasize how to respond to a situation while others demand more varied types of visualizations. Leading tools have libraries of visualization types, and more are available through open source libraries. Organizations should take advantage of maturing data visualization technologies for both advanced data science and data interaction by nontechnical users. Visualization enables “data storytelling.” This hot trend fuses visualization, data analysis, and usually verbal or written discussion, often in an infographic, to provide interpretation of data science results and why they are significant. Storytelling can be an effective way for data science teams to communicate accurately what they have found rather than just present numbers that could be misinterpreted. Organizations should encourage data storytelling and provide training so data science teams and other users can do it well.

Data science fits into a larger objective of creating a datadriven “analytics culture” that is energized by a shared desire to improve decision making at all levels, from executives to frontline personnel. The key goal is to supplant uninformed, emotional decision making based on inaccurate theories with decision processes that are supported by empirical evidence, testing of hypotheses, and impactful data analysis. Although inspiration will always be vital, companies with healthy analytics cultures accept the notion that assumptions should be questioned by looking closely at the data. Data science thrives in an analytics culture. However, not all personnel in an organization are going to be part of data science teams, nor should they be. To bring more users into the analytics culture, organizations should explore technologies that can support the “democratization” of BI, analytics, and data discovery. These products are increasingly able to address users’ self-service demands for data access and interaction without IT hand-holding. The tools go beyond simple spreadsheets and canned reporting to deliver different perspectives on metrics, help users uncover trends, and enable them to personalize dashboards. Data visualization is an essential technology for data science and most self-service BI, analytics, and data discovery use cases. Across organizations, users’ visualization requirements can be diverse; some need simple interfaces that emphasize how to respond to a situation while others demand more varied types of visualizations. Leading tools have libraries of visualization types, and more are available through open source libraries. Organizations should take advantage of maturing data visualization technologies for both advanced data science and data interaction by nontechnical users. Visualization enables “data storytelling.” This hot trend fuses visualization, data analysis, and usually verbal or written discussion, often in an infographic, to provide interpretation of data science results and why they are significant. Storytelling can be an effective way for data science teams to communicate accurately what they have found rather than just present numbers that could be misinterpreted. Organizations should encourage data storytelling and provide training so data science teams and other users can do it well.

5. GIVE DATA SCIENCE TEAMS ACCESS TO ALL THE DATA

Data is the raw material of data science. Like chefs looking for new taste sensations, data scientists need to work closely with data at every step so they know what they have and can extract fresh insights to deliver business value. Although valuable for reporting and proscribed forms of analysis, most traditional BI and data warehousing systems offer users only selected data samples, subsets, and pre-aggregated reports that have been carefully scrubbed and manicured by data professionals. Instead of raw data, most BI users work with reports or dashboards. What they leave behind are unincorporated structured sources and a vast universe of semi- and unstructured data and content that has never easily fit into BI systems and data warehouses. Structured data can, of course, be voluminous and varied, especially when brought in from diverse applications. However, data science is often more closely associated with the desire to analyze semi- and unstructured data because these sources are growing rapidly and have been analyzed little, if at all. Preparing this breadth of data, assessing its quality, looking for gaps and errors, and performing exploratory analysis to determine relevant extracts are essential data science activities. They can take up the lion’s share of a data science team’s time. Although tools can automate steps, data science teams need to get close to the data to properly move forward with analytics and algorithm development. Computer logs, social media, sensor data, and other new sources can be messy and chaotic; organizations should be realistic about the effort it will take to investigate and prepare the data. Organizations should ensure that data science teams include personnel who are comfortable working with raw data. In most cases, the team will need personnel who are knowledgeable about Hadoop and related technologies and are familiar with data lake and data hub concepts for gathering, storing, and accessing raw data. Data science teams should always be on the lookout for interesting and potentially relevant data sources. Often, more than one application will be recording diverse (or sometimes the same or similar) data about customers, transactions, or other objects. Data scientists can play a valuable role by uncovering discrepancies and data quality problems.

6. PREPARE DATA SCIENCE PROCESSES FOR OPERATIONALIZING ANALYTICS

Businesses can execute at a higher level if they can strengthen the connection between analytics and business processes. The first step is to move beyond purely “descriptive” analytics, which only answers what and why questions about historical trends and events, to predictive analytics, which can help discern what is likely to happen next. By streamlining how they develop and deploy predictive models, organizations can expand their use into more operational processes. However, getting business value from this expansion requires more than just producing more analytic models faster. Firms must move to the next stage: to “prescriptive” analytics, which is about producing not just predictive insights but also suggested actions. Prescriptive analytics can be useful to both humans responsible for business processes and for guiding emerging automated decision systems. Potential use cases abound. The most common is to improve customer marketing to offer targeted cross-sell, up-sell, and nextbest-action offers at the moment of engagement. Another example occurs in complex, high-volume supply chains. Leading firms today apply predictive modeling to forecast what might happen given the probability of factors that could affect product manufacturing, packaging, and shipping. To get maximum value from their analysis, these firms are moving toward prescriptive analytics to develop recommended options for automated rules and complex event processing systems. This evolution could also be important for organizations seeking to operationalize analytics to fight fraud, assess risks, position mobile assets, and more, in real time. To operationalize analytics, data science teams must focus on reducing the time it takes to develop and deploy analytic models. With cleaner workflows and processes, data science teams can move away from uncoordinated, artisanal model development and toward practices that include quality feedback sessions to correct flaws. Along with process improvements, organizations can take advantage of new technology practices such as in-database scoring, which can help eliminate time-consuming data movement to specialized data stores, improve the performance of analytic models, and make models available for multiple applications as stored procedures. Teams must continue to improve communication with business stakeholders. Delays in model development and deployment are often due as much to communication difficulties as they are to other factors.

7. IMPROVE GOVERNANCE TO AVOID DATA SCIENCE“CREEPINESS”

Data science teams must keep in mind that the outside world contains another set of stakeholders: the general public, including current and prospective consumers of the firm’s products and services. Fear and concern are at a high level with the continued unfolding of news about data thefts, hacking, surveillance, online and geo location tracking, and marketing retargeting. Leading retail firms have had their reputations sullied by security breaches. Commentators rail about the “creepiness” factor: that is, the extent of knowledge firms are amassing about customers’ purchasing and other observed behavior that through powerful, real-time analytics can be (and often is) turned into highly personalized marketing. “Creepiness” is the label given to what some call the “dark side” of data science. Data science teams, along with business leadership, must be cognizant of the right balance between what they can achieve through advanced analysis of consumer data and what is tolerable—and ethical—from the public’s perspective. Often there is no single standard; companies report that younger “millennial” demographics groups are more tolerant of personalized targeting than are older groups. Some consumers appreciate having the flow of advertising and marketing be more relevant to their buying patterns and shopping interests, while others are surprised and upset by it. Some will voice their concerns through social media, proving the observation that marketing is always a conversation, not one-way communication. Enterprises should ensure that ethics and consumer tolerance are part of data science planning discussions, along with adherence to standard data governance policies. Data science teams must make sure they are not cloistered from the outside world and that they hear about how consumers and the public in general are responding to actions taken based on their data insights. The teams should consult with business leaders to gain their feedback about how certain programs could affect the conversation between the company and the public—and consider the possible ramifications on the company’s reputation. Governance policies should address how to protect sensitive data during data science processes, particularly personally identifiable information. Anonymizing data may not be sufficient. Organizations should examine how they can protect data used in algorithms so that consumers’ behavior patterns cannot be hacked by those looking to identify specific people.

Latest Post

Power up your staff’s skills and boost your business

or Call us on +91 79 403 270 00

Power up Skills and boost your career. Know more about job oriented programs

or Call us on +91 79 403 270 00