Tuesday, 8 December 2020

Data Science

What is Data Science?

Data Science is an interdisciplinary field that uses various tools such as processes, algorithms, and systems to extract knowledge with the goal to discover hidden patterns from many structured and unstructured data or raw data. It is related to data analysis, data mining, machine learning language, domain knowledge, and big data. It is used for recognition (image, voice, text, audio, video, facial, etc), segmentation (demographic-based marketing, etc), automation and decision making (credit card approval, background check, etc), pattern detection (financial market pattern, weather pattern, etc), actionable insights (dashboard, visualization, report, etc), optimization (risk management, etc), recommendation (Amazon and Netflix recommendations), anomaly detection (crime detection, fraud detection, etc), scoring and ranking (FICO score, etc), forecasting (sales, revenue, etc), classifications (in an email server, that classify emails as spam or not spam).

The following techniques are used by data science are logistic regression, linear regression, clustering (used to group data together), machine learning (used to perform tasks by inferencing patterns from data), support vector machine (SVM), and dimensionality reduction (used to reduce the complexity of data computation). There are some programming languages and frameworks which are used by the data science to execute task they are Python, R, and Julie and the frameworks are TensorFlow (for creating machine learning models), PyTorch (also for machine learning), Apache Hadoop software framework (for process data over large distributed systems), and Jupyter Notebook (an interactive web interface for Python).

In the 2013 survey, 90% of the world data has been created within two years. In just two years we have collected and processed 9x amount of information which combined to the previous 92,000 years of humankind and it is consolidated, we have already created 2.7 zettabytes of data, and by 2020, that number will be increased to 44 zettabytes.

Data Science Software Platforms

The following software platforms are

  • Anaconda
  • Rapid Miner
  • Matlab
  • Dataiku

Fundamental Areas of Data Science

  • Business Administration
  • Domain
  • Mathematics
  • Computer science
  • Communication
  • Data Product Engineering
  • Machine Learning

Data Science Life Cycle

Generally it has a nine-stage of life cycle they are

  • Discover
  • Capture
  • Prepare
  • Maintain
  • Plan
  • Process
  • Build
  • Communicate
  • Analyze

Application Areas of Data Science

  • Finance
  • Self-Driving Cars
  • Logistics
  • Healthcare
  • Cyber Security
  • Entertainment

Learn about Firewall



Post a comment