NPTEL Big Data Computing Assignment 3 Answers 2022

NPTEL Big Data Computing Assignment 3 Answers 2022:- In This article, we have provided the answers to NPTEL Big Data Computing Assignment 3. You must submit your assignment with your own knowledge.

About Big Data Computing

In today’s fast-paced digital world, the incredible amount of data being generated every minute has grown tremendously from sensors used to gather climate information, posts on social media sites, digital pictures and videos, purchase transaction records, and GPS signals from cell phones to name a few. This amount of large data with different velocities and varieties is called big data. Its analytics enables professionals to convert extensive data through statistical and quantitative analysis into powerful insights that can drive efficient decisions. This course provides an in-depth understanding of terminologies and the core concepts behind big data problems, applications, systems and the techniques, that underlie today’s big data computing technologies. It provides an introduction to some of the most common frameworks such as Apache Spark, Hadoop, MapReduce, Large scale data storage technologies such as in-memory key/value storage systems, NoSQL distributed databases, Apache Cassandra, HBase and Big Data Streaming Platforms such as Apache Spark Streaming, Apache Kafka Streams that has made big data analysis easier and more accessible. And while discussing the concepts and techniques, we will also look at various applications of Big Data Analytics using Machine Learning, Deep Learning, Graph Processing and many others. The course is suitable for all UG/PG students and practising engineers/ scientists from diverse fields interested in learning about the novel cutting-edge techniques and applications of Big Data Computing.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of the best 6 assignments out of the total 8 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND THE EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

You can find Big Data Computing Assignment 3 Answers 2022 below.

NPTEL Big Data Computing Assignment 3 Answers 2022

1. Consider the following statements in the context of Spark:

Statement 1:
  Spark improves efficiency through in-memory computing primitives and general computation graphs.

Statement 2: 
 Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.

a. Only statement 1 is true 
b. Only statement 2 is true 
c. Both statements are true 
d. Both statements are false

Answer:- c

2. True or False?

Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable. 

a. True 
b. False

Answer:- a

Answers will be Uploaded Shortly and it will be Notified on Telegram, So JOIN NOW

NPTEL Big Data Computing Assignment 2 Answers 2022

3. In Spark, a ______________________is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.

a. Spark Streaming 
b. FlatMap 
c. Resilient Distributed Dataset (RDD) 
d. Driver

Answer:- c

4. Given the following definition about the join transformation in Apache Spark:

Answer:- b

5. True or False ?

Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.

a. True
b. False 

Answer:- a

6. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.

a. MLlib 
b. GraphX 
c. RDDs 
d. Spark Streaming

Answer:- d

👇For Week 04 Assignment Answers👇

NPTEL Big Data Computing Assignment 2 Answers 2022

7. ____________________ is a distributed graph processing framework on top of Spark.

a. GraphX 
b. MLlib 
c. Spark streaming 
d. All of the mentioned

Answer:- a

8. Which of the following are the simplest NoSQL databases ?

a. Wide-column 
b. Key-value 
c. Document 
d. All of the mentioned

Answer:- b

9. Consider the following statements:

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.

Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf).

a. Only statement 1 is true 
b. Only statement 2 is true 
c. Both statements are false 
d. Both statements are true

Answer:- For Answer Click Here

10. Point out the incorrect statement in the context of Cassandra:

a. It is originally designed at Facebook 
b. It is  designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure 
c. It is a centralized key-value store 
d. It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing

Answer:- c

For More NPTEL Answers:- CLICK HERE

Join Our Telegram:- CLICK HERE

NPTEL Big Data Computing Assignment 3 Answers 2021

Q1. In Spark, a ______________________is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.

(A) Spark Streaming 

(B) FlatMap 

(C) Driver 

(D) Resilient Distributed Dataset (RDD)

Ans:- (D) Resilient Distributed Dataset (RDD)

Q2. Given the following definition about the join transformation in Apache Spark:

                   def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

Output the result of joinrdd, when the following code is run.

val rdd1 = sc.parallelize(Seq((“m”,55),(“m”,56),(“e”,57),(“e”,58),(“s”,59),(“s”,54)))

val rdd2 = sc.parallelize(Seq((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))

val joinrdd = rdd1.join(rdd2)

joinrdd.collect

(A) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))

(B) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)),  (s,(54,61)), (s,(54,62)))

(C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

(D) None of the mentioned

Ans:- (C) Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

ALSO READ : NPTEL – Online Courses,Certificate And full details

Q3. Consider the following statements in the context of Spark:

Statement 1:  Spark improves efficiency through in-memory computing primitives and general computation graphs.

Statement 2:  Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.

(A) Only statement 1 is true 

(B) Only statement 2 is true 

(C) Both statements are true 

(D) Both statements are false

Ans:- (C) Both statements are true 

Q4. True or False?

Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable. 

(A) True 

(B) False

Ans:- (A) True 

NPTEL ALL WEEK ASSIGNMENT ANSWERS:-

Q5. Which of the following is not a NoSQL database ? 

(A) HBase 

(B) Cassandra 

(C) SQL Server 

(D) None of the mentioned

Ans:- (C) SQL Server 

Q6. True or False ?

Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk. 

(A) True 

(B) False

Ans:- (A) True 

Q7. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics. 

(A) MLlib 

(B) Spark Streaming 

(C) GraphX 

(D) RDDs

Ans:- For Answer Click Here

Q8. ____________________ is a distributed graph processing framework on top of Spark. 

(A) MLlib 

(B) Spark streaming 

(C) GraphX 

(D) All of the mentioned

Ans:- (C) GraphX 

NOTE:- ANSWERS WILL BE UPDATED SHORTLY AND IT WILL BE NOTIFIED ON TELEGRAM, SO JOIN NOW

Q9. Point out the incorrect statement in the context of Cassandra: 

(A) It is a centralized key-value store 

(B) It is originally designed at Facebook 

(C) It is  designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure 

(D) It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing

Ans:- (A) It is a centralized key-value store 

Q10. Consider the following statements:

ALSO READ : NPTEL – Online Courses,Certificate And full details

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.

Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf). 

(A) Only statement 1 is true 

(B) Only statement 2 is true 

(C) Both statements are false 

(D) Both statements are true

Ans:- For Answer Click Here

Big Data Computing Assignment 3 Answers 2021:- We do not claim 100% surety of answers, these answers are based on our sole knowledge, and by posting these answers we are just trying to help students, so we urge do your assignment on your own.

Big Data Computing Assignment 2 Answers 2021

Cloud computing Assignment 2 Answers 2021 NPTEL

JOIN US ON YOUTUBE

Leave a Comment