# NPTEL An Introduction to Artificial Intelligence Assignment 11 Answers

NPTEL An Introduction to Artificial Intelligence Assignment 11 Answers 2022:- All the Answers are provided here to help the students as a reference, You must submit your assignment with your own knowledge

## What is An Introduction to Artificial Intelligence?

An Introduction to Artificial Intelligence by IIT Delhi course introduces a variety of concepts in the field of artificial intelligence. It discusses the philosophy of AI, and how to model a new problem as an AI problem. It describes a variety of models such as search, logic, Bayes nets, and MDPs, which can be used to model a new problem. It also teaches many first algorithms to solve each formulation. The course prepares a student to take a variety of focused, advanced courses in various subfields of AI.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

## NPTEL An Introduction to Artificial Intelligence Assignment 11 Answers 2022:-

Q1. What is the key point about reinforcement learning in a strong simulator setting?

a. We don’t learn the transition or reward model, but directly learn what to do when.
b. The agent cannot teleport to any state and is restricted
c. The agent can jump to any state and start simulating from there.
d. Agent learns both optimal policy + state values.

Q2. Suppose you are doing Passive Learning on the following state space with the given policy of actions (as mentioned by arrows in the cell). A4 and C4 are absorbing states. Reward for each of the 4 actions (up, down, left, right) is -1. Discount factor is 1.

Q3. Which of the following statements are correct about Boltzmann Exploration?

• All actions have almost equal probability of being executed initially
• Near the end stages of the method, only the actions with the highest expected reward are executed
• The temperature starts off with a very high value and is gradually decreased to a constant c where c < 0
• The temperature is kept fixed throughout

Q4. If we implement Q-Learning with 𝝰=0.9, then what will be the value of Q(c, RIGHT)? All Q(s,a) pairs are initialized to zero. Assume discounting factor to be 1.

Round off the answer to two decimal points.

Q5. If we have an epsilon greedy policy with epsilon=0.2 then what is the probability of the agent taking action RIGHT in state C after the first episode is over?

Q6. Which of the following algorithms can we use to compute optimal policies assuming we do not know the parameters of the MDP, but a simulator for it is available?

Q7. Let us say that we wish to do feature-based Q learning to find the optimal policy for an MDP. Assume n feature functions, f1(s, a), f2(s, a)…fn(s, a) with weights w1, w2,…wn, that are all initialized to 0. Assume discount factor and learning rate both to be equal to 1.

Q8. Q(C, go)

Q9. Q(B, go)

If there are any changes in answers will notify you on telegram so you can get a 100% score, So Join

Q10. Q(A, go)