Reward Function Case Study

Improved Essays
3.5 Definition of the Reward Function
The term “reward” generally refers to the measurable merit of an activated action. The purpose of the reward function is to measure the effectiveness of a classifier in stabilizing the bicycle, i.e. bringing the bicycle back to its upright position from a near fall position. The problem, however, is that in our case the reward of an activated action could not be immediately calculated, as the calculation requires knowledge of the system’s response which occurs with a delay. Consider, for example, the case with as input. If the role angle and its derivative are both positive ( and ), viewing the bicycle body from behind (Fig. 3), its center of mass is placed on the right side of the negative z axis,
…show more content…
Note that since in this case no controller is used, the control signal in Case 1 is zero. Moreover, gravity induced torque which is applied to the system as an external torque is virtually constant due to the insignificant change in its arm (distance from the pivot point to the point where the force acts) during . This justifies the constant acceleration assumption.
Case 2) The bicycle is controlled by the control signal proposed by CCSDR: In this case, the control signal should be applied to the unmanned bicycle in real world environment and the resulting roll angle should be measured at the end of . Solving the governing equations (3-6) for by the forth order Runge-Kutta method gives the value of the roll angle at the end of . The aim of the controller is to stabilize the bicycle in the upright position. Calling the roll angle , the reward would be calculated using the following equation.
According to the above equation, if the roll angle of the bicycle is smaller in the controlled mode than in the uncontrolled mode (i.e. the bike is closer to the upright position), the action is deemed effective and the reward is assigned to it, otherwise no reward is
…show more content…
As usual, applying GA consists of three phases: selection, crossover and mutation. In the selection phase, using a roulette wheel selection, two classifiers (called parents) are chosen probabilistically in a ‘survival of the fittest’ manner, where classifiers with a higher fitness value are more likely to be selected than those with a lower one. The crossover operator is applied on the two selected parents at a predefined rate. Then, at another predefined rate, each bound (lower or upper) of the generated offspring could be mutated. The resulting offspring are inserted into the population and in order to keep the population size constant, two other classifiers are deleted. The removed classifiers are low-fitness ones that have participated in a threshold number of experiments, that is, have had sufficient time for their parameters to be accurately

Related Documents

  • Decent Essays

    Nt1310 Unit 1 Lab 1

    • 737 Words
    • 3 Pages

    Conclusion: This experiment provides an experience on how to apply stresses test in real-world problems. For solving a problem economically, multiple factors should be considered, for example, the crank set’s power efficiency, maximum possible normal and shears stress; weight should be tested for manufacturing. Before building a prototype of a design, build its mathematical model can solve many potential problems in advance, in this experiment, it mathematical model predicts the performance of the system very close to the experiment result. The stress out put of the system is not consistent, for the rider’s speed is not necessarily consistent, further experiment can be performed under consistent speed, for it can influence experiment result a…

    • 737 Words
    • 3 Pages
    Decent Essays
  • Improved Essays

    Ap Psychology Worksheet

    • 615 Words
    • 3 Pages

    A.Heredity-is a term that refers to traits and features that are inherited from one's parents and predecessors. At birth a person inherits 50% of each parent's genetic material (genes) that are passed along through the chromosomes found in the DNA . B.Nativists-is the view that certain skills or facilities are "native" or hard-wired into the encephalon at birth. C.Environmentalists-is an interdisciplinary field that fixates on the interplay between individuals and their circumventions.…

    • 615 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    The sixth way to reward employees that work at our cell phone company is the ability to obtain life insurance. Our company will provide all employees after ninety days of employment with various options of life insurance. This will help protect themselves and their families. The options of life insurance will demonstrate to employees that T-Mobile cares about their families’ future. This in return will make the employees work harder for the company they work for.…

    • 741 Words
    • 3 Pages
    Improved Essays
  • Superior Essays

    Lakshmi Sai Pratyusha Chamarti PSY 339 Description of Technique Differential reinforcement procedures are programs that are used to reinforce acceptable behavior in order to decrease the occurrence of undesirable behavior. Differential reinforcement also ensures that undesirable behavior is not reinforced. A range of problem behaviors, particularly self-injurious behavior is treated with differential reinforcement procedures. Summary of Research Differential reinforcement has several variations. The first variation is called Differential Reinforcement of Other Behavior (DRO).…

    • 1209 Words
    • 5 Pages
    Superior Essays
  • Improved Essays

    Non-Uniform Motion

    • 79 Words
    • 1 Pages

    When a car moves down a ramp, non-uniform motion occurs because the car incrementally increases its speed as it goes down the ramp and slowly decelerates as it leaves the ramp before it reaches a complete stop. The hypothesis holds true as the graphs show similarities to those of an object in non-uniform motion but deviation occurred as a result of human errors. Nevertheless, the conclusion was that the object experiences non-uniform motion throughout its journey down the ramp.…

    • 79 Words
    • 1 Pages
    Improved Essays
  • Improved Essays

    I would use Extrinsic feedback, usually known as external feedback. This is received at the completion of a movement,…

    • 268 Words
    • 2 Pages
    Improved Essays
  • Improved Essays

    The brain’s reward system is basically our survival system, it is what has kept us alive all this time. There are two major types of rewards when it comes to the reward center, and they are food and sex, those along with water are what allow it to activate. When a drug enters the body it goes to the brain’s reward center so that the brain is aware of what just entered the body, well the reason addiction appears is because of the desire. Even though the reward center keeps us alive it also keeps the addiction for drugs alive and help it maintain its survival. The longer a reward is apart of the reward center the less of an influence it starts to have, it in ways naturally becomes a habit over time.…

    • 134 Words
    • 1 Pages
    Improved Essays
  • Improved Essays

    Kohn's Reward System

    • 214 Words
    • 1 Pages

    Kohn offers several reasons as to why rewards do not work first rewards punishes, rewards rupture relationships, and rewards ignore reasons. According to Kohn education creates competition among students. The reading states everyone else is a potential obstacle to one’s own success. The reward system sets students up as one another’s rivals. Overall, the use of grade constitutes as rewards punish.…

    • 214 Words
    • 1 Pages
    Improved Essays
  • Decent Essays

    Google has proclaimed to inventing the first ever self-driving car. They have had several people take a test drive in the car. This car uses its sensors and software to sense items like pedestrians, bicyclists, and more, and are designed to safely drive around them. The car is able to process map and sensor information to determine where it is in that moment. The sensors built into the car help detect objects that it is surrounded by based on their size, shape, and movement pattern.…

    • 274 Words
    • 2 Pages
    Decent Essays
  • Improved Essays

    Apollo Human Parenting

    • 571 Words
    • 3 Pages

    I imagine that the experience of conditioning Apollo has been much like human parenting: exciting and equally frustrating. After 6 days, Apollo probably thinks he has been successfully training us instead. Since he is a younger rat, it was necessary for Apollo to be well socialized and used to interacting with us. He was not receptive at first, but very quickly after became used to us and his environment. One particular issue in the beginning was his insistence on squeaking when we made contact with him or tried to pick him up.…

    • 571 Words
    • 3 Pages
    Improved Essays
  • Improved Essays

    As mentioned before, torque can be obtained by multiplying the mass of the weight, times the force of gravity, times the distance from its pivot point or the fulcrum (T=m*g*l). When the balance reaches equilibrium, the torque that is created by the measured object must equal the torque applied to the other side of the balance. In order to make both torques equal, the masses along the beam of the balance must be moved away from the pivot point, or the fulcrum. By moving these masses, the torque is changing since the length of the lever arm is also changing. Rotational equilibrium is reached when the torque acting on each side of the pivot point are…

    • 1296 Words
    • 6 Pages
    Improved Essays
  • Superior Essays

    Introduction Behavior is generally both physical and psychological response towards stimuli. It encompasses organism’s observable physical movements and internal psychological process, summing up to how an organism responds with the environment. Sniffy experiment seeks to demonstrate how the virtual rat responds to tone and shock variation. It is possible to measure organism behavior using one or more parameters, including duration, intensity, and frequency.…

    • 1233 Words
    • 5 Pages
    Superior Essays
  • Improved Essays

    1.6 Kalman Filter The triplets of change capture and correct forms the conceptual basis for the Kalman filter and it can be stated as follows: Change + Capture =.>Correct The ‘+’ sign above has a deep significance in the way the present and the new information are combined to have progress in the correct direction based on an appropriate criterion.…

    • 785 Words
    • 4 Pages
    Improved Essays
  • Improved Essays

    Aim Simple Pendulum Experiment The aim of the experiment was to determine the effects of two factors (the length of a pendulum and the hanging mass of a pendulum) on the period of oscillation of the simple pendulum and to determine the value of g, the acceleration due to gravity. Apparatus Retort stand String…

    • 873 Words
    • 4 Pages
    Improved Essays
  • Great Essays

    Chapter 5- Reward and Compensation Management This chapter explains the HRM function of reward and compensation and subsequently its impact on organizational performance. First, the reward and compensation as a concept is explained then we advance to elaborating on the reward and compensation strategy at Genpact. After establishing the background and the given framework the chapter investigates the strategic integration and then the major elements of reward and compensation integration and organizational performance is discussed. The research findings are described and at the end, discussions on results are outlined.…

    • 9408 Words
    • 38 Pages
    Great Essays