A Quick Guide on Data Science

A Quick Guide on Data Science

A Quick Guide on Data Science

Created by : Somya Goswami


Posted on : 18 Nov, 2020, 12:35:40 PM


“Data is a precious thing and will last longer than the systems themselves.”  

                                                                                                             By: -   Tim Berners –lee
 

As the 21st century is witnessing the biggest explosion in big data, the term Data science has become quite popular nowadays. According to the Harvard business review, the role of the data scientist is described as the sexiest job of the 21st century.

 

But why do Data Science and Data Scientists become so important?

 

Because Data is scattered everywhere and most companies are sitting amidst this valuable data. As modern technology has enabled the storage and creation of increasing amounts of data easier, Data volumes have exploded.

And this valuable data is stored in databases, Untouched.

 

The wealth of data being collected and stored by these technologies can bring transformative benefits to organizations and societies around the world- but only If we can interpret it. That’s where Data science comes in the big picture.

 

Therefore, it is important to understand what data science is and how it can be valuable for a business.

In this blog, the following points will be covered

 

  • What is Data Science?
  • Why Data Science?
  • Who is a Data Scientist?
  • What is the role of a Data Scientist?
  • How is different from Business Intelligence and Data Science?
  • The Lifecycle of Data Science

 

  • What is Data Science?

Data science is a discipline and practice that’s focused on drawing insights from raw data. Practitioners of data science use a blend of various tools algorithms, and machine learning to discover hidden patterns from the data. It is a subset of Artificial intelligence (AI).

 To have a better understanding of Data Science, it’s equally important to get familiar with terms related to this field such as AI, machine learning, deep learning.

 

  • Artificial Intelligence - It means making a computer to mimic human behavior.
  • Machine Learning- It’s a subset of AI, which includes techniques that enable computers to figure things out from the data and deliver AI applications.
  • Deep Learning- It is a subset of machine learning that enables computers to solve more complex problems.

 

  • Why Data Science?

Traditionally, the data that we had was structured and manageable, which could be easily analyzed by using simple BI tools, On the contrary, the data today is mostly unstructured or semi-structured. Let’s have a look at the data trends in the image below which shows that by 2020, more data will be unstructured.


 

structured-data-vs-unstructured-data-800x432.png

 

 

The data is accumulated from different sources such as financial blogs, social media, and text files, multimedia media forums, etc, and simple BI tools are unable to handle and process this much vast and variety of data. This is why there is a need for more complex and innovative tools and algorithms for processing, analyzing, and drawing meaningful insights out of it.

 

This is not the only reason why data science has become so popular now.

Let’s have a look at the below infographic image to see where data science is dominating and creating its strong foothold

 

data-science-use-cases.jpg

Source: Data Flair

 

  • Who is a Data Scientist?

As a discipline, data science is relatively new. It is a merger between statistical analysis and data mining. By 2008 the title of Data scientist had emerged and the field quickly took off.

 

In simple words, a data scientist is someone whose job includes developing strategies for analyzing data, preparing data for analysis, exploring, analyzing, and visualizing data, building models with data using programming languages such as python and R, and deploying models into applications.

 

  • What is the role of a Data Scientist?
  1. Data scientists are those who crack complex data problems with their strong expertise in specific scientific disciplines, incorporating several elements related to mathematics, statistics, Computer Science, etc. They use the latest technologies and tools in finding solutions and reaching conclusions that are very crucial for an organization’s growth and development
  2. Data scientists are experts in extracting meaningful data and present it adequately as compared to the raw data available to them in an unstructured or semi-structured way.
  3. Data scientists don’t work solo. The most effective data science is done in teams. Apart from data scientists, this team might include people like a Business analyst, Data Engineer, and IT architect and an application developer, etc.

 

Moving further, let’s now discuss business intelligence BI, which is often confused with data science. Here, in this blog there is a clear-cut contrast between the two, to have a clear understanding. Let’s have a look.

 

  • Business Intelligence v/s Data Science
  1. Business intelligence deals in providing patterns and insights to describe business trends. . it enables us to extract data from various external and internal sources, to prepare it, analyze it, run queries, and create dashboards to answer questions about monthly revenue, analysis, or business problems. BI can help in evaluating the impact of certain events shortly.
  2. Data science is a more visionary and advanced approach, an exploratory way with a focus on analyzing the past or existing data and predicting future outcomes to make informed decisions. It answers the open-ended questions as to “What” or “how” events occur.

 

Let’s have a look at some contrasting features-

 

Features

Business Intelligence

Data Science

Data sources

Structured (Usually SQL, often Data Warehouse)

Both structure and unstructured (logs, cloud data, SQL, NoSQL, text)

Approach

Statistics and Visualization

Statistics, Machine Learning, Graph Analysis, Neuro-linguistic Programming

Tools

Past and Present

Present and Future

Focus

Pentaho, Microsoft, BI, Qlikview, R

RapidMiner, BigML, Weka, R


 

This was about Data science. Now let’s take a look and understand the life cycle of data science.

 

  • The Lifecycle of Data Science

Phase1 –Discovery

Before beginning a project, it’s important to understand its various aspects, requirements, priorities, etc. You must possess the ability to ask the right questions. Here, in this phase, you need to frame the business problem and formulate an initial hypothesis.

 

Phase2- Data Preparation

In this phase, you need an analytical sandbox through which you can perform analytics during the entire project. You need to explore, preprocess, and condition data before modeling. Further, you need to perform ETLT (Extract, transform, load, and transform) to get data into the sandbox.

 

Phase3- Model Planning

In this phase, you will determine the methods and techniques to draw the relationship between variables. These relationships will act as a base for the algorithms that will be implemented in the next phase and build up a model.

 

Phase4- Model Building

In this phase, data sets will be developed for training and testing purposes. Here, you need to consider whether your current tools are enough for running the models or it needs a more robust environment (like fast and parallel processing). You will also analyze various learning techniques like classification, association, and clustering to build a model.

 

Phase5-Operationalize

In this phase, Final reports, briefings, code, and technical documents will be presented. Besides, sometimes a pilot project is also implemented in a real-time production environment. This will give a clear picture of the performance and other related constraints on a small scale before a full deployment.

 

Phase6- Communicate Results

In this last phase, you will identify all the key findings, communicate to the stakeholders, and determine if the desired results of a project are a success or a failure based on parameters set in phase1.

 

Hope you got the right information from this quick data science blog. For more data science certification courses, you can check our blog section.