Member Login - User Registration - Set as Homepage - Add to Favorites - Website Map Artificial Intelligence Full Course _ Artificial Intelligence Tutorial for Beginners _ Edureka - Ep29!

Artificial Intelligence Full Course _ Artificial Intelligence Tutorial for Beginners _ Edureka - Ep29

Time: 2025-07-11 11:41:36 Source: Codora.ai Author: ai Reading: 547 times
Edge if you can trump white house day 1see has anumber linked with it this numberdenotes the cost to Travers through thatedge so we need to choose a policy totravel from a to d in such a way thatour cost is minimum so in this problemthe set of states are denoted by thenodes a b c d the action is to Traversefrom one node to the other for exampleif you're going from a to c that is anaction C to B is an action action b2d isanother action now the reward is thecost represented by each Edge policy isa part taken to reach the destination sowe need to make sure that we choose apolicy in such a way that our cost isminimized so what you can do is you canstart off at node a and you can take BBsteps to reach your destinationinitially only the next possible node isvisible to you so from a you can eithergo to b or you can go to C so if youfollow the greedy approach and take themost Optimum step which is choosing a toc instead of choosing a to B to C nowyou're at node C and you want toTraverse to node D again you must chooseyour path very wisely so if you Traversfrom a to C and C to B and B to D yourcost is the least but if you Traversefrom a to c to D your cost will actuallyincrease now you need to choose a policythat will minimize your cost over hereso let's say for example the agent chosea to c to D right it came to node C andthen it directly chose D now the policyfollowed by our agent in this problem isexploitation type because we didn'texplore the other nodes we just selectedthree nodes and we traversed throughthem and the policy we followed is notactually an optimal policy we mustalways Explore More to find out theoptimal policy even if the other nodesare not giving us any more reward or isactually increasing our cost we stillhave to explore and find out if thosepaths are actually better or that policyis actually better the method that weimplemented Here is known as the policybased learning now the aim here is tofind the best policy among all thepossible policies so guys apart frompolicy based we also have value basedapproach and action based approach valuebased emphasizes on maximizing therewards and in action base we emphasizeon each action taken by the agent now apoint to note is that all of theselearning approaches have a simp end goalthe end goal is to effectively guide theagent through the environment andacquire the most number of rewards sothis was very simple to understandmarkov's decision process exploitationand exploration tradeoff and we alsodiscussed the different reinforcementlearning definitions right I hope all ofthis was understandable now let's moveon and understand algorithm known as Qlearning algorithm so guys Q learning isone of the most important algorithm Msin reinforcement learning and we'lldiscuss this algorithm with the help ofa small example right we'll study thisexample and then we'll implement thesame example using Python and we'll seehow it works so this is how ourdemonstration looks for now now theproblem statement is to place an agentin any one of the rooms numbered 0 1 2 3and 4 and the goal is for the agent toreach outside the building which is roomnumber five so basically this 0 1 2 3 4represents the buildingand five represents a room which isoutside the building now all these roomsare connected by doors now these gapsthat you see between the rooms arebasically doors and each room isnumbered from 0er to four the outside ofthe building can be thought of as a bigroom which is room number five now ifyou notice this diagram the door numberone and door number four lead directlyto room number five from one you candirectly go to five and from four alsoyou can directly go to five but if youwant to go to five from room number twothen you'll first have to go to roomnumber three room number one and thenroom number five so these are indirectlinks direct links are from room numberone and room number four so I hope allof you are clear with the problemstatement you're basically going to havea reinforcement learning agent and thatagent has to Traverse through all therooms in such a way that he reaches roomnumber five to solve this problem firstwhat we'll do is we'll represent therooms on a graph now each room isdenoted as as a node and the links thatare connecting these nodes are the doorsall right so we have Node 1 to five andthe links between each of these nodesrepresent the Dos so for example if youlook at this graph over here you can seethat there is a direct connection from 1to 5 meaning that you can directly gofrom room number one to your goal whichis room number five so if you want to gofrom room number 3 to 5 you can eithergo to room number one and then go tofive or you can go from room numberthree to four and then to five so guysremember end goal is to reach roomnumber five now to set the room numberfive as the goal State what we'll do iswe'll associate a reward value to eachdoor the doors that lead immediately tothe goal will have an instant reward of100 so basically 1 to five will have areward of 100 and four to five will alsohave a reward of 100 now other doorsthat are not directly connected to theTarget room will have a zero rewardbecause they do not directly lead us tothe goal so let's say you place theagent in room number three so to go fromroom number three to one the agent willget a reward of zero and to go from 1 tofive the agent will get a reward of 100now because the doors are two-way thetwo arrows are assigned to each room youcan see an arrow going towards the roomand one coming from the room so eachArrow contains an instant reward asshown in this figure now of course

(Editor in charge: php)

Related content
  • BEST WEB3 MOBA! - Arena of Faith - Ep4
  • Web3 Developer in 2024 Roadmap_ Solidity, Smart Contract, and Blockchain Development [Full Course] - Ep25
  • Learn Blockchain, Solidity, and Full Stack Web3 Development with JavaScript – 32-Hour Course - Ep163
  • Web3 Developer in 2024 Roadmap_ Solidity, Smart Contract, and Blockchain Development [Full Course] - Ep61
  • GenAI Essentials – Full Course for Beginners - Ep183
  • Learn Blockchain, Solidity, and Full Stack Web3 Development with JavaScript – 32-Hour Course - Ep2
  • Learn Blockchain, Solidity, and Full Stack Web3 Development with JavaScript – 32-Hour Course - Ep261
  • Getting started with Web3 in 2025 _ Deep Dive Explainer - Ep6
Recommended content
  • Learn Blockchain, Solidity, and Full Stack Web3 Development with JavaScript – 32-Hour Course - Ep25
  • Solana Developer Bootcamp 2024 - Learn Blockchain and Full Stack Web3 Development - Projects 1-9 - Ep58
  • Solana Developer Bootcamp 2024 - Learn Blockchain and Full Stack Web3 Development - Projects 1-9 - Ep15
  • Solana Developer Bootcamp 2024 - Learn Blockchain and Full Stack Web3 Development - Projects 1-9 - Ep77
  • Solana Developer Bootcamp 2024 - Learn Blockchain and Full Stack Web3 Development - Projects 1-9 - Ep36
  • Getting started with Web3 in 2025 _ Deep Dive Explainer - Ep10