Saturday, February 4, 2023

EMERGING TECHNOLOGY CHAPTER 2 QUESTIONS AND ANSWERS

 


1)     What is Data science

is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured, semi-structured and unstructured data.

2)     What is data?

Data is a representation of facts, concepts, or instructions in a formalized manner, which should be suitable for communication, interpretation, or processing, by human or electronic machines.

3)     What is information?

information is the processed data on which decisions and actions are based.

information is data that has been processed into a form that is meaningful to the recipient and is of real or perceived value in the current

4)     Steps of data Processing cycle

*input:  in this step, the input data is prepared in some convenient form for processing, The form will depend on the processing machine.

 

*Process: in this step, the input data is changed to produce data in a more useful form.

 

*Output: at this stage, the result of the proceeding processing step is collected.

The particular form of the output data depends on the use of the data.

 

5)     Lis the F the Common Data Types

·       Integers(int)- is used to store whole numbers, mathematically known as integers

·       Booleans(bool)- is used to represent restricted to one of two values: true or false

·       Characters(char)- is used to store a single character

·       Floating-point numbers(float)- is used to store real numbers

·       Alphanumeric strings(string)- used to store a combination of characters and numbers

6)     What is Structured Data

Structured data is data that adheres to a pre-defined data model and is therefore straight forward to analyze.

conforms to a tabular format with a relationship between the different rows and columns

7)     What is Semi-structured Data

is a form of structured data that does not conform with the formal structure of data models associated with relational databases

8)     What is Unstructured

is information that either does not have a predefined data model or is not organized in a pre-defined manner.

9)     What is Metadata

is data about data. It provides additional information about a specific set of data.

10) The Big Data Value Chain identifies the following key high-level activities:

·       Data Acquisition: is the process of gathering, filtering, and cleaning data before it is put in a data warehouse

·       Data Analysis: is concerned with making the raw data acquired amenable to use in decision-making as well as domain-specific usage.

·       Data Curation: is the active management of data over its life cycle to ensure it meets the necessary data quality requirements for its effective usage

·       Data Storage: It is the persistence and management of data in a scalable way that satisfies the needs of applications that require fast access to the data.

·       Data usage: covers the data-driven business activities that need access to data, its analysis, and the tools needed to integrate the data

11) What Is Big Data?

is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

12) Big data is characterized by 3V and more:

13) Volume: large amounts of data Zeta bytes/Massive datasets

14) Velocity: Data is live streaming or in motion

15) Variety: data comes in many different forms from diverse sources

16) Veracity: can we trust the data? How accurate is it? etc.

13 What is Clustered Computing

Big data clustering software combines the resources of many smaller machines, seeking to provide a number of benefits:

14)  Resource Pooling

: Combining the available storage space to hold data is a clear benefit, but CPU and memory pooling are also extremely important.

15) What is Hadoop and its Ecosystem

is an open-source framework intended to make interaction with big data easier.

It is a framework that allows for the distributed processing of large datasets across clusters of computers using simple programming models.

 

 

 

No comments:

Post a Comment