NPTEL Big Data Computing Assignment 1 Answers 2023:- In This article, we have provided the answers of NPTEL Big Data Computing Assignment 1. You must submit your assignment with your own knowledge.
NPTEL Big Data Computing Week 1 Assignment Answers
1. What are the three key characteristics of Big Data, often referred to as the 3V’s, according to IBM?
- Viscosity, Velocity, Veracity
- Volume, Value, Variety
- Volume, Velocity, Variety
- Volumetric, Visceral, Vortex
Answer :- For Answer Click Here
2. What is the primary purpose of the MapReduce programming model in processing and generating large data sets?
- To directly process and analyze data without any intermediate steps.
- To convert unstructured data into structured data.
- To specify a map function for generating intermediate key/value pairs and a reduce function for merging values associated with the same key.
- To create visualizations and graphs for large data sets.
Answer :- For Answer Click Here
3. _____ is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
- Flume
- Apache Sqoop
- Pig
- Mahout
Answer :- For Answer Click Here
4. What is the primary role of YARN (Yet Another Resource Manager) in the Apache Hadoop ecosystem?
- YARN is a data storage layer for managing and storing large datasets in Hadoop clusters.
- YARN is a programming model for processing and analyzing data in Hadoop clusters.
- YARN is responsible for allocating system resources and scheduling tasks for applications in a Hadoop cluster.
- YARN is a visualization tool for creating graphs and charts based on Hadoop data.
Answer :- For Answer Click Here
5. Which of the following statements accurately describes the characteristics and functionality of HDFS (Hadoop Distributed File System)?
- HDFS is a centralized file system designed for storing small files and achieving high-speed data processing.
- HDFS is a programming language used for writing MapReduce applications within the Hadoop ecosystem.
- HDFS is a distributed, scalable, and portable file system designed for storing large files across multiple machines, achieving reliability through replication.
- HDFS is a visualization tool that generates graphs and charts based on data stored in the Hadoop ecosystem.
Answer :- For Answer Click Here
6. Which statement accurately describes the role and design of HBase in the Hadoop stack?
- HBase is a programming language used for writing complex data processing algorithms in the Hadoop ecosystem.
- HBase is a data warehousing solution designed for batch processing of large datasets in Hadoop clusters.
- HBase is a key-value store that provides fast random access to substantial datasets, making it suitable for applications requiring such access patterns.
- HBase is a visualization tool that generates charts and graphs based on data stored in Hadoop clusters.
Answer :- For Answer Click Here
7. ______ brings scalable parallel database technology to Hadoop and allows users to submit low latencies queries to the data that’s stored within the HDFS or the Hbase without acquiring a ton of data movement and manipulation.
- Apache Sqoop
- Mahout
- Flume
- Impala
Answer :- For Answer Click Here
8. What is the primary purpose of ZooKeeper in a distributed system?
- ZooKeeper is a data warehousing solution for storing and managing large datasets in a distributed cluster.
- ZooKeeper is a programming language for developing distributed applications in a cloud environment.
- ZooKeeper is a highly reliable distributed coordination kernel used for tasks such as distributed locking, configuration management, leadership election, and work queues.
- ZooKeeper is a visualization tool for creating graphs and charts based on data stored in distributed systems.
Answer :- For Answer Click Here
9. ____ is a distributed file system that stores data on a commodity machine. Providing very high aggregate bandwidth across the entire cluster.
- Hadoop Common
- Hadoop Distributed File System (HDFS)
- Hadoop YARN
- Hadoop MapReduce
Answer :- For Answer Click Here
10. Which statement accurately describes Spark MLlib?
- Spark MLlib is a visualization tool for creating charts and graphs based on data processed in Spark clusters.
- Spark MLlib is a programming language used for writing Spark applications in a distributed environment.
- Spark MLlib is a distributed machine learning framework built on top of Spark Core, providing scalable machine learning algorithms and utilities for tasks such as classification, regression, clustering, and collaborative filtering.
- Spark MLlib is a data warehousing solution for storing and querying large datasets in a Spark cluster.
Answer :- For Answer Click Here
Course Name | Big Data Computing |
Category | NPTEL Assignment Answer |
Home | Click Here |
Join Us on Telegram | Click Here |
About Big Data Computing
In today’s fast-paced digital world, the incredible amount of data being generated every minute has grown tremendously from sensors used to gather climate information, posts on social media sites, digital pictures and videos, purchase transaction records, and GPS signals from cell phones to name a few. This amount of large data with different velocities and varieties is called big data. Its analytics enables professionals to convert extensive data through statistical and quantitative analysis into powerful insights that can drive efficient decisions. This course provides an in-depth understanding of terminologies and the core concepts behind big data problems, applications, systems and the techniques, that underlie today’s big data computing technologies. It provides an introduction to some of the most common frameworks such as Apache Spark, Hadoop, MapReduce, Large scale data storage technologies such as in-memory key/value storage systems, NoSQL distributed databases, Apache Cassandra, HBase and Big Data Streaming Platforms such as Apache Spark Streaming, Apache Kafka Streams that has made big data analysis easier and more accessible. And while discussing the concepts and techniques, we will also look at various applications of Big Data Analytics using Machine Learning, Deep Learning, Graph Processing and many others. The course is suitable for all UG/PG students and practising engineers/ scientists from diverse fields interested in learning about the novel cutting-edge techniques and applications of Big Data Computing.
CRITERIA TO GET A CERTIFICATE
Average assignment score = 25% of the average of the best 6 assignments out of the total 8 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100
Final score = Average assignment score + Exam score
YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND THE EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.
You can find Big Data Computing Assignment 1 Answers below.
NPTEL Big Data Computing Assignment 1 Answers 2022
1. True or False ?
Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently.
a. True
b. False
Answer:- a
2. What does “Velocity” in Big Data mean?
a. Speed of input data generation
b. Speed of individual machine processors
c. Speed of storing and processing data
d. Speed of ONLY storing data
Answer:- c
Answers will be Uploaded Shortly and it will be Notified on Telegram, So JOIN NOW
3. _______________ refers to the accuracy and correctness of the data relative to a particular use.
a. Value
b. Veracity
c. Velocity
d. Validity
Answer:- d
4. Consider the following statements:
Statement 1: Viscosity refers to the connectedness of big data.
Statement 2: Volatility refers to the rate of data loss and stable lifetime of data.
a. Only statement 1 is true
b. Only statement 2 is true
c. Both statements are true
d. Both statements are false
Answer:- b
5. ______________ is a programming model and an associated implementation for processing and generating large data sets.
a. HDFS
b. YARN
c. Map Reduce
d. PIG
Answer:- c
6. _______________is an open source software framework for big data. It has two basic parts: HDFS and Map Reduce.
a. Spark
b. HBASE
c. HIVE
d. Apache Hadoop
Answer:- d
👇For Week 02 Assignment Answers👇
7. The fundamental idea of __________________ is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global Resource Manager (RM) and per-application Application Master (AM). An application is either a single job or a DAG of jobs.
a. Hadoop Common
b. Hadoop Distributed File System (HDFS)
c. Hadoop YARN
d. Hadoop MapReduce
Answer:- c
8. ____________is a highly reliable distributed coordination kernel , which can be used for distributed locking, configuration management, leadership election, and work queues etc.
a. Apache Sqoop
b. Mahout
c. Flume
d. ZooKeeper
Answer:- For Answer Click Here
9. ______________ is an open source stream processing software platform developed by the Apache Software Foundation written in Scala and Java.
a. Hive
b. Cassandra
b. Apache Kafka
d. RDDs
Answer:- c
10. True or False?
NoSQL databases are non-tabular databases and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads.
a. True
b. False
Answer:- Answers will be Uploaded Shortly and it will be Notified on Telegram, So JOIN NOW
For More NPTEL Answers:- CLICK HERE
Join Our Telegram:- CLICK HERE
NPTEL Big Data Computing Assignment 1 Answers 2022:- In This article, we have provided the answers of NPTEL Big Data Computing Assignment 1. You must submit your assignment with your own knowledge.
NPTEL Big Data Computing Assignment 1 Answers 2021
Q1.________________ is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.
- (A) Hadoop Common
- (B) Hadoop Distributed File System (HDFS)
- (C) Hadoop YARN
- (D) Hadoop Map Reduce
Ans:- (C) Hadoop YARN
Q2. Which of the following tool is designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases?
- (A) Pig
- (B) Mahout
- (C) Apache Sqoop
- (D) Flume
Ans:- (C) Apache Sqoop
Q3. _________________is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
- (A) Flume
- (B) Apache Sqoop
- (C) Pig
- (D) Mahout
Ans:- (A) Flume
Q4. _______________refers to the connectedness of big data.
- (A) Value
- (B) Veracity
- (C) Velocity
- (D) Valence
Ans:- (A) Value
Q5. Consider the following statements:
Statement 1: Volatility refers to the data velocity relative to timescale of event being studied
Statement 2: Viscosity refers to the rate of data loss and stable lifetime of data
- (A) Only statement 1 is true
- (B) Only statement 2 is true
- (C) Both statements are true
- (D) Both statements are false
Ans:- (D) Both statements are false
SOFT SKILLS WEEK 5 ASSIGNMENT ANSWERS (NPTEL)
Q6. ________________refers to the biases, noise and abnormality in data, trustworthiness of data.
- (A) Value
- (B) Veracity
- (C) Velocity
- (D) Volume
Ans:-(B) Veracity
Q7. _____________ brings scalable parallel database technology to Hadoop and allows users to submit low latencies queries to the data that’s stored within the HDFS or the Hbase without acquiring a ton of data movement and manipulation.
- (A) Apache Sqoop
- (B) Mahout
- (C) Flume
- (D) Impala
Ans:- For Answer Click Here
Q8. True or False?
NoSQL databases store unstructured data with no particular schema.
- (A) True
- (B) False
Ans:- (B) False
Q9. ____________is a highly reliable distributed coordination kernel , which can be used for distributed locking, configuration management, leadership election, and work queues etc.
- (a) Apache Sqoop
- (B) Mahout
- (C) ZooKeeper
- (D) Flume
Ans:- For Answer Click Here
Q10. True or False?
MapReduce is a programming model and an associated implementation for processing and generating large data sets.
- (A) True
- (B) False
Ans:- (A) True
Big Data Computing Assignment 1 Answers 2021:- We do not claim 100% surety of answers, these answers are based on our sole knowledge, and by posting these answers we are just trying to help students, so we urge do your assignment on your own.
Hi there, I found your website via Google while looking for a related topic, your site came up, it looks good. I’ve bookmarked it in my google bookmarks.