1) What
is Data science
is a multi-disciplinary field that uses scientific
methods, processes, algorithms, and systems to extract knowledge and insights
from structured, semi-structured and unstructured data.
2) What
is data?
Data is a representation
of facts, concepts, or instructions in a formalized manner, which should be
suitable for communication, interpretation, or processing, by human or
electronic machines.
3) What
is information?
information is the
processed data on which decisions and actions are based.
information is data that has been processed into a
form that is meaningful to the recipient and is of real or perceived value in
the current
4) Steps
of data Processing cycle
*input: in this step, the input data is prepared in
some convenient form for processing, The form will depend on the processing
machine.
*Process: in this step, the input
data is changed to produce data in a more useful form.
*Output: at this stage, the result of
the proceeding processing step is collected.
The particular form of the output
data depends on the use of the data.
5) Lis
the F the Common Data Types
· Integers(int)-
is used to store whole numbers, mathematically known as integers
· Booleans(bool)-
is used to represent restricted to one of two values: true or false
· Characters(char)-
is used to store a single character
· Floating-point
numbers(float)- is used to store real numbers
· Alphanumeric
strings(string)- used to store a combination of characters and numbers
6) What
is Structured Data
Structured data is data
that adheres to a pre-defined data model and is therefore straight forward to
analyze.
conforms to a tabular
format with a relationship between the different rows and columns
7) What
is Semi-structured Data
is a form of structured data that does not conform
with the formal structure of data models associated with relational databases
8) What
is Unstructured
is information that either does not have a predefined
data model or is not organized in a pre-defined manner.
9) What
is Metadata
is data about data. It provides
additional information about a specific set of data.
10) The
Big Data Value Chain identifies the following key high-level activities:
· Data
Acquisition: is the process of gathering, filtering,
and cleaning data before it is put in a data warehouse
· Data
Analysis: is concerned with making the raw data acquired
amenable to use in decision-making as well as domain-specific usage.
· Data
Curation: is the active management of data over its life cycle
to ensure it meets the necessary data quality requirements for its effective
usage
· Data
Storage: It is the persistence and management
of data in a scalable way that satisfies the needs of applications that require
fast access to the data.
· Data
usage: covers the data-driven business activities that need
access to data, its analysis, and the tools needed to integrate the data
11) What
Is Big Data?
is the term for a collection of data
sets so large and complex that it becomes difficult to process using on-hand
database management tools or traditional data processing applications.
12) Big
data is characterized by 3V and more:
13) Volume:
large amounts of data Zeta bytes/Massive datasets
14) Velocity:
Data is live streaming or in motion
15) Variety:
data comes in many different forms from diverse sources
16) Veracity:
can we trust the data? How accurate is it? etc.
13 What is Clustered Computing
Big data clustering software combines the resources of
many smaller machines, seeking to provide a number of benefits:
14)
Resource Pooling
: Combining the available
storage space to hold data is a clear benefit, but CPU and memory pooling are
also extremely important.
15) What is Hadoop and its Ecosystem
is an open-source framework intended to make
interaction with big data easier.
It is a framework that allows for the distributed
processing of large datasets across clusters of computers using simple
programming models.
No comments:
Post a Comment