連続強化

関: CRF

WordNet

of a function or curve; extending without break or irregularity
continuing in time or space without interruption; "a continuous rearrangement of electrons in the solar atoms results in the emission of light"- James Jeans; "a continuous bout of illness lasting six months"; "lived in continuous fear"; "a continuous row of warehouses"; "a continuous line has no gaps or breaks in it"; "moving midweek holidays to the nearest Monday or Friday allows uninterrupted work weeks" (同)uninterrupted
information that makes more forcible or convincing; "his gestures provided eloquent reinforcement for his complaints" (同)reenforcement

PrepTutorEJDIC

(時間的・空間的に)『切り目なく続く』;続けざまの,途切れない
〈U〉増強,補強 / 〈C〉補強材 / 《複数形で》援軍,増援隊
(17,8世紀音楽での)通奏低音

Wikipedia preview

出典(authority):フリー百科事典『ウィキペディア（Wikipedia）』「2013/02/26 13:39:41」(JST)

wiki en

[Wiki en表示]

"Reinforce" redirects here. For the Magical Girl Lyrical Nanoha character, see Reinforce (Nanoha).

This article is about the term used in operant conditioning. For the construction materials reinforcement, see Rebar. For reinforcement learning in computer science, see Reinforcement learning. For beam stiffening, see Stiffening.

Reinforcement is a term in operant conditioning and behavior analysis for a process of strengthening a directly measurable dimension of behavior—such as rate (e.g., pulling a lever more frequently), duration (e.g., pulling a lever for longer periods of time), magnitude (e.g., pulling a lever with greater force), or latency (e.g., pulling a lever more quickly following the onset of an environmental event)—as a function of the delivery of a stimulus (e.g. money from a slot machine) immediately or shortly after the occurrence of the behavior. Giving a monkey a banana for performing a trick is an example of positive reinforcement.

Reinforcement is only said to have occurred if the delivery of the stimulus is directly caused by the response made. Although in many cases in human behavior a reinforcing stimulus is something which is "valued" by the individual or which the individual "likes," (e.g., money received from a slot machine, the good taste of an apple, the positive effects of a drug) this is not a requirement for reinforcing effects. Indeed, reinforcement doesn't even require an individual to consciously perceive an effect elicited by the stimulus.

Furthermore, stimuli that are "rewarding" or "liked" aren't always reinforcing: if an individual eats at McDonald's (response) and likes the taste of the food (stimulus), but believes it is bad for their health, they may not eat it again and thus it was not reinforcing in that condition.

Animals and humans repeat behaviours that produce in positive results and avoid performing behaviours that produce negative results.^[1]

A reinforcer is a temporally contiguous environmental event, or an effect directly produced by a response (e.g., a musician playing a melody), that functions to strengthen or maintain the response that preceded the event. A reinforcer is demonstrated only if the strengthening or maintenance effect occurs.

Response strength is assessed by measuring the frequency, duration, latency, accuracy, and/or persistence of the response after reinforcement stops. Early experimental behavior analysts measured the rate of responses as a primary demonstration of learning and performance in non-humans (e.g., the number of times a pigeon pecks a key in a 10-minute session).

1 Types
- 1.1 Positive and negative
- 1.2 Primary reinforcers
- 1.3 Secondary reinforcers
- 1.4 Other reinforcement terms
2 Natural and artificial
3 Intermittent reinforcements
4 Schedules
- 4.1 Simple schedules
  - 4.1.1 Effects of different types of simple schedules
- 4.2 Compound schedules
- 4.3 Superimposed schedules
- 4.4 Concurrent schedules
5 Shaping
6 Chaining
7 Persuasive communication & the reinforcement theory
8 Mathematical models
9 Criticisms
- 9.1 History of the terms
10 See also
11 References
12 Further reading
13 External links

Types

B.F. Skinner, the researcher who articulated the major theoretical constructs of reinforcement and behaviorism, defined reinforcers according to the change in response strength rather than to more subjective criteria, such as what is pleasurable or valuable to someone. Accordingly, activities, foods or items considered pleasant or enjoyable may not necessarily be reinforcing (because they produce no increase in the response preceding them). Stimuli, settings, and activities only fit the definition of reinforcers if the behavior that immediately precedes the potential reinforcer increases in similar situations in the future, for example, a child who receives a cookie when he or she asks for one. If the frequency of "cookie-requesting behavior" increases, the cookie can be seen as reinforcing "cookie-requesting behavior". If however, "cookie-requesting behavior" does not increase the cookie cannot be considered reinforcing.

Reinforcement theory is one of the motivation theories; it states that reinforced behavior will be repeated, and behavior that is not reinforced is less likely to be repeated.^[2]

The sole criteria that determines if an item, activity, or food is reinforcing is the change in probability of a behavior after administration of that potential reinforcer. Other theories may focus on additional factors such as whether the person expected the strategy to work at some point, but in the behavioral theory, reinforcement is descriptive of an increased probability of a response.

The study of reinforcement has produced an enormous body of reproducible experimental results. Reinforcement is the central concept and procedure in special education, applied behavior analysis, and the experimental analysis of behavior.

Positive and negative

As Skinner discussed, positive reinforcement is superior to punishment in altering behavior. He maintained that punishment was not simply the opposite of positive reinforcement; positive reinforcement results in lasting behavioral modification, whereas punishment changes behavior only temporarily and presents many detrimental side effects.^[3]

The accepted model of reinforcement began shifting in 1966 when Azrin and Holz contributed a chapter^[4] to Honig's volume on operant conditioning. Skinner defined reinforcement as creating situations that a person likes or removing a situation he doesn't like, and punishment as removing a situation a person likes or setting up one he doesn't like.^[3] Thus the distinction was based on the appetitive or aversive nature of the stimulus. Azrin and Holz defined punishment "as 'a reduction of the future probability of a specific response as a result of the immediate delivery of a stimulus for that response'."^[5] This new definition of punishment encroached on Skinner's definition of reinforcement, but most textbooks now only present examples of the 1966 model summarized below:

Helpful definitions:

Appetitive stimulus: a pleasant outcome
Aversive stimulus: an unpleasant outcome

A positive reinforcer is a consequence that increases the frequency of a behavior or maintains the frequency. What is reinforcing is defined by what happens to the frequency of the behavior. It has nothing to do with whether the organism finds the reinforcer "pleasant" or not. For example, if a child gets slapped for saying a "naughty" word but the frequency of naughty words increases, the slap is a positive reinforcer.

A "pleasant" consequence is not necessarily a positive reinforcer.^[6] Getting a birthday gift is not a positive reinforcer. There is no behavior that will increase (or be maintained) in frequency. When deciding whether or not something is a reinforcer, the basic criteria is the frequency of occurrence of a behavior.

Consequences are not universally reinforcing. For example, happy face stickers may be effective reinforcers for some children. Other children may find them silly.^{[citation needed]}

A negative reinforcer is not punishment. These terms are often confused. A negative reinforcer increases or maintains the frequency of the behavior that terminates the negative reinforcer. In this case the negative reinforcer is present before the behavior. The organism performs a behavior that terminates the negative reinforcer. The behavior that terminates the negative reinforcer is likely to increase or be maintained in frequency. Suppose someone has a headache (negative reinforcer). The person takes two aspirin but nothing happens. Then the person takes two Tylenol tablets and the headache goes away. The next time the person has a headache it is likely the person will take Tylenol. That is the behavior that has been reinforced.

Forms of operant conditioning:

Positive reinforcement: the adding of an appetitive stimulus to increase a certain behavior or response.
Example: Father gives candy to his daughter when she picks up her toys. If the frequency of picking up the toys increases or stays the same, the candy is a positive reinforcer.
Positive punishment: the adding of an aversive stimulus to decrease a certain behavior or response.
Example: Mother yells at a child when running into the street. If the child stops running into the street the yelling is positive punishment.
Negative reinforcement: the taking away of an aversive stimulus to increase certain behavior or response.
Example: Putting ointment on a bug bite to soothe an itch. If using ointment on bug bites increases, the removal of an itch is a negative reinforcer.
Negative punishment (omission training): the taking away of an appetitive stimulus to decrease a certain behavior.
Example: A teenager comes home an hour after curfew and the parents take away the teen's cell phone for two days. If the frequency of coming home after curfew decreases, the removal of the phone is negative punishment.

The following table illustrates that punishment and reinforcement are a function of the presentation or removal of a stimulus and the valence of the stimulus.

	Appetitive (pleasant) stimulus	Aversive (unpleasant) stimulus
Presented	positive reinforcement	positive punishment
Taken away	negative punishment	negative reinforcement

Distinguishing "positive" from "negative" can be difficult, especially when there are lots of consequences and the necessity of the distinction is often debated.^[7] For example, in a very warm room, a current of external air serves as positive reinforcement because it is pleasantly cool or negative reinforcement because it removes uncomfortably hot air.^[8] Some reinforcement can be simultaneously positive and negative, such as a drug addict taking drugs for the added euphoria and eliminating withdrawal symptoms. Many behavioral psychologists simply refer to reinforcement or punishment—without polarity—to cover all consequent environmental changes. Others would disagree with the above examples because there is no behavior that is increasing or decreasing in frequency.

Primary reinforcers

A primary reinforcer, sometimes called an unconditioned reinforcer, is a stimulus that does not require pairing to function as a reinforcer and most likely has obtained this function through the evolution and its role in species' survival.^[9] Examples of primary reinforcers include sleep, food, air, water, and sex. Some primary reinforcers, such as certain drugs, may mimic the effects of other primary reinforcers. While these primary reinforcers are fairly stable through life and across individuals, the reinforcing value of different primary reinforcers varies due to multiple factors (e.g., genetics, experience). Thus, one person may prefer one type of food while another abhors it. Or one person may eat lots of food while another eats very little. So even though food is a primary reinforcer for both individuals, the value of food as a reinforcer differs between them.

Secondary reinforcers

A secondary reinforcer, sometimes called a conditioned reinforcer, is a stimulus or situation that has acquired its function as a reinforcer after pairing with a stimulus that functions as a reinforcer. This stimulus may be a primary reinforcer or another conditioned reinforcer (such as money). An example of a secondary reinforcer would be the sound from a clicker, as used in clicker training. The sound of the clicker has been associated with praise or treats, and subsequently, the sound of the clicker may function as a reinforcer. As with primary reinforcers, an organism can experience satiation and deprivation with secondary reinforcers.

Other reinforcement terms

A generalized reinforcer is a conditioned reinforcer that has obtained the reinforcing function by pairing with many other reinforcers (such as money, a secondary generalized reinforcer).
In reinforcer sampling, a potentially reinforcing but unfamiliar stimulus is presented to an organism without regard to any prior behavior.
Socially-mediated reinforcement (direct reinforcement) involves the delivery of reinforcement that requires the behavior of another organism.
The Premack principle is a special case of reinforcement elaborated by David Premack, which states that a highly-preferred activity can be used effectively as a reinforcer for a less-preferred activity.
Reinforcement hierarchy is a list of actions, rank-ordering the most desirable to least desirable consequences that may serve as a reinforcer. A reinforcement hierarchy can be used to determine the relative frequency and desirability of different activities, and is often employed when applying the Premack principle.^{[citation needed]}
Contingent outcomes are more likely to reinforce behavior than non-contingent responses. Contingent outcomes are those directly linked to a causal behavior, such a light turning on being contingent on flipping a switch. Note that contingent outcomes are not necessary to demonstrate reinforcement, but perceived contingency may increase learning.
Contiguous stimuli are stimuli closely associated by time and space with specific behaviors. They reduce the amount of time needed to learn a behavior while increasing its resistance to extinction. Giving a dog a piece of food immediately after sitting is more contiguous with (and therefore more likely to reinforce) the behavior than a several minute delay in food delivery following the behavior.
Noncontingent reinforcement refers to response-independent delivery of stimuli identified as reinforcers for some behaviors of that organism. However, this typically entails time-based delivery of stimuli identified as maintaining aberrant behavior, which decreases the rate of the target behavior.^[10] As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".^[11]

Natural and artificial

In his 1967 paper, Arbitrary and Natural Reinforcement, Charles Ferster proposed classifying reinforcement into events that increase frequency of an operant as a natural consequence of the behavior itself, and events that are presumed to affect frequency by their requirement of human mediation, such as in a token economy where subjects are "rewarded" for certain behavior with an arbitrary token of a negotiable value. In 1970, Baer and Wolf created a name for the use of natural reinforcers called "behavior traps".^[12] A behavior trap requires only a simple response to enter the trap, yet once entered, the trap cannot be resisted in creating general behavior change. It is the use of a behavioral trap that increases a person's repertoire, by exposing them to the naturally occurring reinforcement of that behavior. Behavior traps have four characteristics:

They are "baited" with virtually irresistible reinforcers that "lure" the student to the trap
Only a low-effort response already in the repertoire is necessary to enter the trap
Interrelated contingencies of reinforcement inside the trap motivate the person to acquire, extend, and maintain targeted academic/social skills^[13]
They can remain effective for long periods of time because the person shows few, if any, satiation effects

As can be seen from the above, artificial reinforcement is in fact created to build or develop skills, and to generalize, it is important that either a behavior trap is introduced to "capture" the skill and utilize naturally occurring reinforcement to maintain or increase it. This behavior trap may simply be a social situation that will generally result from a specific behavior once it has met a certain criterion (e.g., if you use edible reinforcers to train a person to say hello and smile at people when they meet them, after that skill has been built up, the natural reinforcer of other people smiling, and having more friendly interactions will naturally reinforce the skill and the edibles can be faded).^{[citation needed]}

Intermittent reinforcements

Pigeons experimented on in a scientific study were more responsive to intermittent reinforcements, than positive reinforcements.^[14] In other words, pigeons were more prone to act when they only sometimes could get what they wanted. R.B Sparkman, a journalist specialized on what motivates human behaviour, claims this is also true for humans.^[15]

Schedules

When an animal's surroundings are controlled, its behavior patterns after reinforcement become predictable, even for very complex behavior patterns. A schedule of reinforcement is a rule or program that determines how and when the occurrence of a response will be followed by the delivery of the reinforcer, and extinction, in which no response is reinforced. Schedules of reinforcement influence how an instrumental response is learned and how it is maintained by reinforcement. Between these extremes is intermittent or partial reinforcement where only some responses are reinforced.

Specific variations of intermittent reinforcement reliably induce specific patterns of response, irrespective of the species being investigated (including humans in some conditions). The orderliness and predictability of behavior under schedules of reinforcement was evidence for B.F. Skinner's claim that by using operant conditioning he could obtain "control over behavior", in a way that rendered the theoretical disputes of contemporary comparative psychology obsolete. The reliability of schedule control supported the idea that a radical behaviorist experimental analysis of behavior could be the foundation for a psychology that did not refer to mental or cognitive processes. The reliability of schedules also led to the development of applied behavior analysis as a means of controlling or altering behavior.

Many of the simpler possibilities, and some of the more complex ones, were investigated at great length by Skinner using pigeons, but new schedules continue to be defined and investigated.

Simple schedules

A chart demonstrating the different response rate of the four simple schedules of reinforcement, each hatch mark designates a reinforcer being given

Ratio schedule – the reinforcement depends only on the number of responses the organism has performed.
Continuous reinforcement (CRF) – a schedule of reinforcement in which every occurrence of the instrumental response (desired response) is followed by the reinforcer.
- Lab example: each time a rat presses a bar it gets a pellet of food.
- Real world example: each time a dog defecates outside its owner gives it a treat, each time a person puts $1 in a candy machine and pressed the buttons they receive a candy bar.

Simple schedules have a single rule to determine when a single type of reinforcer is delivered for specific response.

Fixed ratio (FR) – schedules deliver reinforcement after every nth response.
- Example: FR2" = every second desired response the subject makes is reinforced.
- Lab example: FR5" = rat reinforced with food after every 5 bar-presses in a Skinner box.
- Real-world example: FR10" = Used car dealer gets a $1000 bonus for each 10 cars sold on the lot.
Variable ratio schedule (VR) – a reinforcement schedule in which the number of responses necessary to produce reinforcement varies from trial to trial. A VR schedule of VR10 means that if one averaged the number of reinforcers, on the average every tenth desired response was reinforced.
- Lab example: VR4" = first pellet delivered on 2 bar presses, second pellet delivered on 6 bar presses, third pellet 4 bar presses (2 + 6 + 4 = 12; 12/3= 4 bar presses to receive pellet).
- Real-world example: slot machines (because, though the probability of hitting the jackpot is constant, the number of lever presses needed to hit the jackpot is variable).
Fixed interval (FI) – reinforced after every nth amount of time.
- Example: FI1" = reinforcement provided for the first response after 1 second.
- Lab example: FI15" = rat is reinforced for the first bar press after 15 seconds passes since the last reinforcement.
- Real world example: washing machine cycle.
Variable interval (VI) – reinforced on an average every nth amount of time. the 'n' is an average.
- Example: VI4" = first pellet delivered after 2 minutes, second delivered after 6 minutes, third is delivered after 4 minutes (2 + 6 + 4 = 12; 12/ 3 = 4). Reinforcement is delivered on the average after 4 minutes.
- Lab example: VI10" = a rat is reinforced for the first bar press after an average of 10 seconds passes since the last reinforcement.
- Real world example: checking your e-mail or pop quizzes. Going fishing—you might catch a fish after 10 minutes, then have to wait an hour, then have to wait 18 minutes.

Other simple schedules include:

Differential reinforcement of incompatible behavior – Used to reduce a frequent behavior without punishing it by reinforcing an incompatible response. An example would be reinforcing clapping to reduce nose picking.
Differential reinforcement of other behavior (DRO) – Also known as omission training procedures, an instrumental conditioning procedure in which a positive reinforcer is periodically delivered only if the participant does something other than the target response. An example would be reinforcing any hand action other than nose picking.
Differential reinforcement of low response rate (DRL) – Used to encourage low rates of responding. It is like an interval schedule, except that premature responses reset the time required between behavior.
- Lab example: DRL10" = a rat is reinforced for the first response after 10 seconds, but if the rat responds earlier than 10 seconds there is no reinforcement and the rat has to wait 10 seconds from that premature response without another response before bar pressing will lead to reinforcement.
- Real world example: "If you ask me for a potato chip no more than once every 10 minutes, I will give it to you. If you ask more often, I will give you none."
Differential reinforcement of high rate (DRH) – Used to increase high rates of responding. It is like an interval schedule, except that a minimum number of responses are required in the interval in order to receive reinforcement.
- Lab example: DRH10"/15 responses = a rat must press a bar 15 times within a 10 second increment to get reinforced.
- Real world example: "If Lance Armstrong is going to win the Tour de France he has to pedal x number of times during the y-hour race."
Fixed time (FT) – Provides reinforcement at a fixed time since the last reinforcement, irrespective of whether the subject has responded or not. In other words, it is a non-contingent schedule.
- Lab example: FT5" = rat gets food every 5" regardless of the behavior.
- Real world example: a person gets an annuity check every month regardless of behavior between checks
Variable time (VT) – Provides reinforcement at an average variable time since last reinforcement, regardless of whether the subject has responded or not.

Effects of different types of simple schedules

Fixed ratio: activity slows after reinforcer and then picks up.
Variable ratio: high rate of responding greatest activity of all schedules, responding rate is high and stable.
Fixed interval: activity increases as deadline nears, can cause fast extinction.
Variable interval: steady activity results, good resistance to extinction.

Ratio schedules produce higher rates of responding than interval schedules, when the rates of reinforcement are otherwise similar.
Variable schedules produce higher rates and greater resistance to extinction than most fixed schedules. This is also known as the Partial Reinforcement Extinction Effect (PREE).
The variable ratio schedule produces both the highest rate of responding and the greatest resistance to extinction (for example, the behavior of gamblers at slot machines).
Fixed schedules produce "post-reinforcement pauses" (PRP), where responses will briefly cease immediately following reinforcement, though the pause is a function of the upcoming response requirement rather than the prior reinforcement.^[16]
- The PRP of a fixed interval schedule is frequently followed by a "scallop-shaped" accelerating rate of response, while fixed ratio schedules produce a more "angular" response.
  - fixed interval scallop: the pattern of responding that develops with fixed interval reinforcement schedule, performance on a fixed interval reflects subject's accuracy in telling time.
Organisms whose schedules of reinforcement are "thinned" (that is, requiring more responses or a greater wait before reinforcement) may experience "ratio strain" if thinned too quickly. This produces behavior similar to that seen during extinction.
- Ratio strain: the disruption of responding that occurs when a fixed ratio response requirement is increased too rapidly.
- Ratio run: high and steady rate of responding that completes each ratio requirement. Usually higher ratio requirement causes longer post-reinforcement pauses to occur.
Partial reinforcement schedules are more resistant to extinction than continuous reinforcement schedules.
- Ratio schedules are more resistant than interval schedules and variable schedules more resistant than fixed ones.
- Momentary changes in reinforcement value lead to dynamic changes in behavior.^[17]

Compound schedules

Compound schedules combine two or more different simple schedules in some way using the same reinforcer for the same behavior. There are many possibilities; among those most often used are:

Alternative schedules – A type of compound schedule where two or more simple schedules are in effect and whichever schedule is completed first results in reinforcement.^[18]
Conjunctive schedules – A complex schedule of reinforcement where two or more simple schedules are in effect independently of each other, and requirements on all of the simple schedules must be met for reinforcement.
Multiple schedules – Two or more schedules alternate over time, with a stimulus indicating which is in force. Reinforcement is delivered if the response requirement is met while a schedule is in effect.
- Example: FR4 when given a whistle and FI6 when given a bell ring.
Mixed schedules – Either of two, or more, schedules may occur with no stimulus indicating which is in force. Reinforcement is delivered if the response requirement is met while a schedule is in effect.
- Example: FI6 and then VR3 without any stimulus warning of the change in schedule.
Concurrent schedules – A complex reinforcement procedure in which the participant can choose any one of two or more simple reinforcement schedules that are available simultaneously. Organisms are free to change back and forth between the response alternatives at any time.
- Real world example: changing channels on a television.
Concurrent-chain schedule of reinforcement – A complex reinforcement procedure in which the participant is permitted to choose during the first link which of several simple reinforcement schedules will be in effect in the second link. Once a choice has been made, the rejected alternatives become unavailable until the start of the next trial.
Interlocking schedules – A single schedule with two components where progress in one component affects progress in the other component. An interlocking FR60–FI120, for example, each response subtracts time from the interval component such that each response is "equal" to removing two seconds from the FI.
Chained schedules – Reinforcement occurs after two or more successive schedules have been completed, with a stimulus indicating when one schedule has been completed and the next has started.
- Example: FR10 in a green light when completed it goes to a yellow light to indicate FR3, after it is completed it goes into red light to indicate VI6, etc. At the end of the chain, a reinforcer is given.
Tandem schedules – Reinforcement occurs when two or more successive schedule requirements have been completed, with no stimulus indicating when a schedule has been completed and the next has started.
- Example: VR10, after it is completed the schedule is changed without warning to FR10, after that it is changed without warning to FR16, etc. At the end of the series of schedules, a reinforcer is finally given.
Higher-order schedules – completion of one schedule is reinforced according to a second schedule; e.g. in FR2 (FI10 secs), two successive fixed interval schedules require completion before a response is reinforced.

Superimposed schedules

The psychology term, superimposed schedules of reinforcement, refers to a structure of rewards where two or more simple schedules of reinforcement operate simultaneously. Reinforcers can be positive, negative, or both. An example is a person who comes home after a long day at work. The behavior of opening the front door is rewarded by a big kiss on the lips by the person's spouse and a rip in the pants from the family dog jumping enthusiastically. Another example of superimposed schedules of reinforcement is a pigeon in an experimental cage pecking at a button. The pecks deliver a hopper of grain every 20th peck, and access to water after every 200 pecks.

Superimposed schedules of reinforcement are a type of compound schedule that evolved from the initial work on simple schedules of reinforcement by B.F. Skinner and his colleagues (Skinner and Ferster, 1957). They demonstrated that reinforcers could be delivered on schedules, and further that organisms behaved differently under different schedules. Rather than a reinforcer, such as food or water, being delivered every time as a consequence of some behavior, a reinforcer could be delivered after more than one instance of the behavior. For example, a pigeon may be required to peck a button switch ten times before food appears. This is a "ratio schedule". Also, a reinforcer could be delivered after an interval of time passed following a target behavior. An example is a rat that is given a food pellet immediately following the first response that occurs after two minutes has elapsed since the last lever press. This is called an "interval schedule".

In addition, ratio schedules can deliver reinforcement following fixed or variable number of behaviors by the individual organism. Likewise, interval schedules can deliver reinforcement following fixed or variable intervals of time following a single response by the organism. Individual behaviors tend to generate response rates that differ based upon how the reinforcement schedule is created. Much subsequent research in many labs examined the effects on behaviors of scheduling reinforcers.

If an organism is offered the opportunity to choose between or among two or more simple schedules of reinforcement at the same time, the reinforcement structure is called a "concurrent schedule of reinforcement". Brechner (1974, 1977) introduced the concept of superimposed schedules of reinforcement in an attempt to create a laboratory analogy of social traps, such as when humans overharvest their fisheries or tear down their rainforests. Brechner created a situation where simple reinforcement schedules were superimposed upon each other. In other words, a single response or group of responses by an organism led to multiple consequences. Concurrent schedules of reinforcement can be thought of as "or" schedules, and superimposed schedules of reinforcement can be thought of as "and" schedules. Brechner and Linder (1981) and Brechner (1987) expanded the concept to describe how superimposed schedules and the social trap analogy could be used to analyze the way energy flows through systems.

Superimposed schedules of reinforcement have many real-world applications in addition to generating social traps. Many different human individual and social situations can be created by superimposing simple reinforcement schedules. For example a human being could have simultaneous tobacco and alcohol addictions. Even more complex situations can be created or simulated by superimposing two or more concurrent schedules. For example, a high school senior could have a choice between going to Stanford University or UCLA, and at the same time have the choice of going into the Army or the Air Force, and simultaneously the choice of taking a job with an internet company or a job with a software company. That is a reinforcement structure of three superimposed concurrent schedules of reinforcement.

Superimposed schedules of reinforcement can create the three classic conflict situations (approach–approach conflict, approach–avoidance conflict, and avoidance–avoidance conflict) described by Kurt Lewin (1935) and can operationalize other Lewinian situations analyzed by his force field analysis. Other examples of the use of superimposed schedules of reinforcement as an analytical tool are its application to the contingencies of rent control (Brechner, 2003) and problem of toxic waste dumping in the Los Angeles County storm drain system (Brechner, 2010).

Concurrent schedules

In operant conditioning, concurrent schedules of reinforcement are schedules of reinforcement that are simultaneously available to an animal subject or human participant, so that the subject or participant can respond on either schedule. For example, in a two-alternative forced choice task, a pigeon in a Skinner box is faced with two pecking keys; pecking responses can be made on either, and food reinforcement might follow a peck on either. The schedules of reinforcement arranged for pecks on the two keys can be different. They may be independent, or they may be linked so that behavior on one key affects the likelihood of reinforcement on the other.

It is not necessary for responses on the two schedules to be physically distinct. In an alternate way of arranging concurrent schedules, introduced by Findley in 1958, both schedules are arranged on a single key or other response device, and the subject can respond on a second key to change between the schedules. In such a "Findley concurrent" procedure, a stimulus (e.g., the color of the main key) signals which schedule is in effect.

Concurrent schedules often induce rapid alternation between the keys. To prevent this, a "changeover delay" is commonly introduced: each schedule is inactivated for a brief period after the subject switches to it.

When both the concurrent schedules are variable intervals, a quantitative relationship known as the matching law is found between relative response rates in the two schedules and the relative reinforcement rates they deliver; this was first observed by R.J. Herrnstein in 1961. Matching law is a rule for instrumental behavior which states that the relative rate of responding on a particular response alternative equals the relative rate of reinforcement for that response (rate of behavior = rate of reinforcement). Animals and humans have a tendency to prefer choice in schedules.^[19]

Shaping

Main article: Shaping (psychology)

Shaping is reinforcement of successive approximations to a desired instrumental response. In training a rat to press a lever, for example, simply turning toward the lever is reinforced at first. Then, only turning and stepping toward it is reinforced. The outcomes of one set of behaviours starts the shaping process for the next set of behaviours, and the outcomes of that set prepares the shaping process for the next set, and so on. As training progresses, the response reinforced becomes progressively more like the desired behavior; each subsequent behvaviour becomes a closer approximation of the final behaviour.^[20]

Chaining

Main article: Chaining

Chaining involves linking discrete behaviors together in a series, such that each result of each behavior is both the reinforcement (or consequence) for the previous behavior, and the stimuli (or antecedent) for the next behavior. There are many ways to teach chaining, such as forward chaining (starting from the first behavior in the chain), backwards chaining (starting from the last behavior) and total task chaining (in which the entire behavior is taught from beginning to end, rather than as a series of steps). An example is opening a locked door. First the key is inserted, then turned, then the door opened.

Forward chaining would teach the subject first to insert the key. Once that task is mastered, they are told to insert the key, and taught to turn it. Once that task is mastered, they are told to perform the first two, then taught to open the door. Backwards chaining would involve the teacher first inserting and turning the key, and the subject is taught to open the door. Once that is learned, the teacher inserts the key, and the subject is taught to turn it, then opens the door as the next step. Finally, the subject is taught to insert the key, and they turn and open the door. Once the first step is mastered, the entire task has been taught. Total task chaining would involve teaching the entire task as a single series, prompting through all steps. Prompts are faded (reduced) at each step as they are mastered.

Persuasive communication & the reinforcement theory

Persuasive communication: Persuasion influences any person the way they think, act and feel. Persuasive skill tells about how people understand the concern, position and needs of the people. Persuasion can be classified into informal persuasion and formal persuasion.
Informal persuasion: This tells about the way in which a person interacts with his/her colleagues and customers. The informal persuasion can be used in team, memos as well as e-mails.
Formal persuasion: This type of persuasion is used in writing customer letter, proposal and also for formal presentation to any customer or colleagues.
Process of persuasion: Persuasion relates how you influence people with your skills, experience, knowledge, leadership, qualities and team capabilities. Persuasion is an interactive process while getting the work done by others. Here are examples for which you can use persuasion skills in real time. Interview: you can prove your best talents, skills and expertise. Clients: to guide your clients for the achievement of the goals or targets. Memos: to express your ideas and views to coworkers for the improvement in the operations. Resistance identification and positive attitude are the vital roles of persuasion.

Persuasion is a form of human interaction. It takes place when one individual expects some particular response from one or more other individuals and deliberately sets out to secure the response through the use of communication. The communicator must realize that different groups have different values.^[21]

In instrumental learning situations, which involve operant behavior, the persuasive communicator will present his message and then wait for the receiver to make a correct response. As soon as the receiver makes the response, the communicator will attempt to fix the response by some appropriate reward or reinforcement.^[22]

In conditional learning situations, where there is respondent behavior, the communicator presents his message so as to elicit the response he wants from the receiver, and the stimulus that originally served to elicit the response then becomes the reinforcing or rewarding element in conditioning.^[23]

Mathematical models

A lot of work has been done in building a mathematical model of reinforcement. This model is known as MPR, short for mathematical principles of reinforcement. Killeen and Sitomer are among the key researchers in this field.

Criticisms

The standard definition of behavioral reinforcement has been criticized as circular, since it appears to argue that response strength is increased by reinforcement, and defines reinforcement as something that increases response strength (i.e., response strength is increased by things that increase response strength). However, the correct usage^[24] of reinforcement is that something is a reinforcer because of its effect on behavior, and not the other way around. It becomes circular if one says that a particular stimulus strengthens behavior because it is a reinforcer, and does not explain why a stimulus is producing that effect on the behavior. Other definitions have been proposed, such as F.D. Sheffield's "consummatory behavior contingent on a response", but these are not broadly used in psychology.^[25]

History of the terms

In the 1920s Russian physiologist Ivan Pavlov may have been the first to use the word reinforcement with respect to behavior, but (according to Dinsmoor) he used its approximate Russian cognate sparingly, and even then it referred to strengthening an already-learned but weakening response. He did not use it, as it is today, for selecting and strengthening new behaviors. Pavlov's introduction of the word extinction (in Russian) approximates today's psychological use.

In popular use, positive reinforcement is often used as a synonym for reward, with people (not behavior) thus being "reinforced", but this is contrary to the term's consistent technical usage, as it is a dimension of behavior, and not the person, which is strengthened. Negative reinforcement is often used by laypeople and even social scientists outside psychology as a synonym for punishment. This is contrary to modern technical use, but it was B.F. Skinner who first used it this way in his 1938 book. By 1953, however, he followed others in thus employing the word punishment, and he re-cast negative reinforcement for the removal of aversive stimuli.

There are some within the field of behavior analysis^[7] who have suggested that the terms "positive" and "negative" constitute an unnecessary distinction in discussing reinforcement as it is often unclear whether stimuli are being removed or presented. For example, Iwata^[8] poses the question: "...is a change in temperature more accurately characterized by the presentation of cold (heat) or the removal of heat (cold)?" (p. 363). Thus, you could conceptualize reinforcement as a pre-change condition replaced by a post-change condition that reinforces the behavior that followed the change in stimulus conditions.

References

^ Schacter, Daniel (2012). Psychology. Worth Publications. pp. 19.
^ Patrick J. Montana & Bruce H. Charnov. (2008). Management, 4th Edition. Barron's Educational Series. p. 247.
^ ^a ^b Skinner, B.F. (1970). Walden Two. Macmillan, Toronto.
^ Azrin, N.H. & Holz, W.C. (1966). Punishment. In W.K. Honig (Ed.), Operant behavior: areas of research and application. New York: Appleton-Century- Crofts. pp. 380–447.
^ Blackman, D. (1974). Operant Conditioning: An Experimental Analysis of Behaviour. London: Methuen. p. 143.
^ Overskeid, G. (2012). The role of emotions in reinforcement: Response selection in humans. The Psychological Record, 62, 125-132.
^ ^a ^b Michael, J. (1975, 2005). Positive and negative reinforcement, a distinction that is no longer necessary; or a better way to talk about bad things. Journal of Organizational Behavior Management, 24, 207–22.
^ ^a ^b Iwata, B.A. (1987). Negative reinforcement in applied behavior analysis: an emerging technology. Journal of Applied Behavior Analysis, 20, 361–78.
^ Skinner, B.F. (1974). About Behaviorism
^ Tucker, M.; Sigafoos, J. & Bushell, H. (1998). Use of noncontingent reinforcement in the treatment of challenging behavior. Behavior Modification, 22, 529–47.
^ Poling, A. & Normand, M. (1999). Noncontingent reinforcement: an inappropriate description of time-based schedules that reduce behavior. Journal of Applied Behavior Analysis, 32, 237–8.
^ Baer and Wolf, 1970, The entry into natural communities of reinforcement. In R. Ulrich, T. Stachnik, & J. Mabry (eds.), Control of human behavior (Vol. 2, pp. 319–24). Gleenview, IL: Scott, Foresman.
^ Kohler & Greenwood, 1986, Toward a technology of generalization: The identification of natural contingencies of reinforcement. The Behavior Analyst, 9, 19–26.
^ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1333219/
^ Sparkman, R. B. (1979). The Art of Manipulation. Doubleday Publishing. p. 34. ISBN 0385270070.
^ Derenne, A. & Flannery, K.A. (2007). Within Session FR Pausing. The Behavior Analyst Today, 8(2), 175–86 BAO
^ McSweeney, F.K.; Murphy, E.S. & Kowal, B.P. (2001) Dynamic Changes in Reinforcer Value: Some Misconceptions and Why You Should Care. The Behavior Analyst Today, 2(4), 341–7 BAO
^ Iversen, I.H. & Lattal, K.A. Experimental Analysis of Behavior. 1991, Elsevier, Amsterdam.
^ Toby L. Martin, C.T. Yu, Garry L. Martin & Daniela Fazzio (2006): On Choice, Preference, and Preference For Choice. The Behavior Analyst Today, 7(2), 234–48 BAO
^ Schacter, Daniel L., Daniel T. Gilbert, and Daniel M. Wegner. "Chapter 7: Learning." Psychology. ; Second Edition. N.p.: Worth, Incorporated, 2011. 284-85.
^ Bettinghaus, Erwin P., Persuasive Communication, Holt, Rinehart and Winston, Inc., 1968, pp. 24–5
^ Skinner, B.F., The Behavior of Organisms. An Experimental Analysis, New York: Appleton-Century-Crofts. 1938
^ Bettinghaus, Erwin P., Persuasive Communication, Holt, Rinehart and Winston, Inc., 1968
^ Epstein, L.H. 1982. Skinner for the Classroom. Champaign, IL: Research Press
^ Franco J. Vaccarino, Bernard B. Schiff & Stephen E. Glickman (1989). Biological view of reinforcement. in Stephen B. Klein and Robert Mowrer. Contemporary learning theories: Instrumental conditioning theory and the impact of biological constraints on learning. Hillsdale, NJ, Lawrence Erlbaum Associates

External links

An On-Line Positive Reinforcement Tutorial
Scholarpedia Reinforcement
scienceofbehavior.com

UpToDate Contents

全文を閲覧するには購読必要です。 To read the full text you will need to subscribe.

1. 急性腎障害（急性腎不全）における持続的腎代替療法 continuous renal replacement therapy in acute kidney injury acute renal failure
2. 持続的静静脈血液濾過透析：技術的な懸念 continuous venovenous hemodiafiltration technical considerations
3. 持続的動静脈透析：技術的な懸念 continuous arteriovenous hemodialysis technical considerations
4. 持続的静静脈血液透析：技術的な懸念 continuous venovenous hemodialysis technical considerations
5. 持続的腎代替療法：概要 continuous renal replacement therapies overview

English Journal

The use of a cystic fibrosis patient registry to assess outcomes and improve cystic fibrosis care in Germany.

Stern M.SourceDepartment of General Paediatrics, Haematology, Oncology, University Children's Hospital Tuebingen, Tuebingen, Germany.
Current opinion in pulmonary medicine.Curr Opin Pulm Med.2011 Nov;17(6):473-7.
PURPOSE OF REVIEW: Cystic fibrosis (CF) patient registries have become an important epidemiological tool for demography, networking, and quality management. This review describes recent developments in patient registries, outcome research, and pilot projects in quality improvement.RECENT FINDINGS: N
PMID 21881513

Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.

Hachiya H, Peters J, Sugiyama M.SourceTokyo Institute of Technology, O-okayama, Meguro-ku, Tokyo 152-8552, Japan hachiya@sg.cs.titech.ac.jp.
Neural computation.Neural Comput.2011 Nov;23(11):2798-832. Epub 2011 Aug 18.
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive
PMID 21851281

Japanese Journal

強化学習によるロボットの動作獲得のための基底関数に基づく行動空間生成手法

山口明彦,高松淳,小笠原司
日本ロボット学会誌 29(1), 55-66, 2011-01-15
… Discrete action sets are often used in many reinforcement learning (RL) applications in robot control, since such sets are compatible with many RL methods and sophisticated architectures, such as Q(λ)-learning [1] and the Dyna. … Moreover, we also propose a method WF-DCOB, where the wire-fitting [2] is utilized to learn within a continuous action space which the DCOB discretizes. …
NAID 10027648535