点击下载
本文文档

当前位置：首页 - 正文

Automatic Regulation of the Information Flow in th

来源：动视网责编：小OO 时间：2025-10-02 17:39:33

Automatic Regulation of the Information Flow in th

AutomaticRegulationoftheInformationFlowintheControlLoopsofaWebTeleoperatedRobot*J.-A.Fernández-Madrigal,C.Galindo,E.Cruz-Martín,A.Cruz-Martín,andJ.GonzálezSystemEngineeringandAutomationDepartmentUniversityofMálaga,(Spain)e-mails:{jafma,cipriano,anac

推荐度：

点击下载本文 文档为doc格式

导读AutomaticRegulationoftheInformationFlowintheControlLoopsofaWebTeleoperatedRobot*J.-A.Fernández-Madrigal,C.Galindo,E.Cruz-Martín,A.Cruz-Martín,andJ.GonzálezSystemEngineeringandAutomationDepartmentUniversityofMálaga,(Spain)e-mails:{jafma,cipriano,anac

Automatic Regulation of the Information Flow in the Control Loops of a Web Teleoperated Robot* J.-A. Fernández-Madrigal, C. Galindo, E. Cruz-Martín, A. Cruz-Martín, and J. González

System Engineering and Automation Department

University of Málaga, (Spai n)

e-mails: {jafma,cipriano,anacm,jgonzalez}@ctima.uma.es; elenacm@isa.uma.es

Abstract— The use of the World Wide Web for robot teleoperation is growing in the last years due mainly to the pervasiveness of internet and web browsers, although web interfaces usually use ethernet networks that exhibit time unpredictability. Most recent research in the area has been focused on improving time predictability of the network under delays, jitter, and no guaranteed bandwidth. However, we believe that: i) not only the network, but every component in the interfaced system exhibit time unpredictability; and ii) improving time predictability is not the only solution: adapting the interfaced system to unpredictable conditions is also a possibility. In this paper we consider a web interfaced robot as a set of control loops and describe and implement an hysteresis controller for regulating the flow of information through the loops as a method to satisfy the system time requirements under some unpredictable and varying conditions. For demonstrating the goodness of our algorithm, we a) compare it with a near-optimal one automatically generated through reinforcement learning, and b) show an implementation of the algorithm for the direct teleoperation of a service mobile robot, obtaining a better behavior than the same system without flow regulation.

Keywords-Web Interfaces, Mobile Robots, Teleoperation.

I.I NTRODUCTION

Networked robots that are teleoperated through the W orld W ide W e b a re increasing their popularity, mainly due t o the pervasiv e ness of internet and web browsers. Their applications include telecare robotics([1],[2]), museum assistants([3],[4]), educat ion ([5],[6]), etc.

S e veral topics have been addressed recently in the research literature on web teleoperation of robots: network performance ([7],[8]), virtual reality for replacing real informati on when it is not available at appropriate time rates([9],[10]), internet multimedia systems [11], traditional teleoperation systems ([12],[13]), communication protocol s ([14],[15]),or soft computing controllers[16].

I n particular, w e believe that trying to improve the predictability of the n e twork lacks the generality needed for s olving the problem. T he whole interfaced system (the client-side interface, the robot, and the network) should be taken under consideration as a whole,since undesirable time effects may appear in several parts of it. For example, the time consumption of t he web interface may be of a magnitude comparable to network delays, or even much greater if the network uses high-speed technology.Thus, t he design of systems that adapt to unpredictable or varying t ime conditions seems a reasonable approach.

I n a previo us work [17]we have presented a probabilistic model of the control loops of a web interfaced robot that allows t he user interface to select automatically, among a set of control loops with different time requirements, the one t hat is most likely to satisfy the t iming constraints. We have called this “coarse adaptation” of the web interface, and it has demonstrated its suitability when any component in the loops changes drastically its performance. In the same work we have also considered t he possibility of deactivating certain graphical components t o improve the time consumption of the system before changing to a different control loop. That has been called “medium adaptation”. The drawback of both actions is that t hey lead to a brupt changes in the modality of control of t he user over the robot.

I n this paper we focus in a third type of regulation: “fine adaptation”, which allows us to deal with changes in t ime performance that are not important enough t o deactivat e parts of the interface or t o deactivat e the loop.

F ine adaptation is aimed to affect as little as possible to the user control of the system and thus it should be used more frequently than coarse or medium adaptation; i t s goal is to a utomatically regulate t he amount of sensory information gathered by the robot a nd flow ing through the system to be displayed o n the web interface1.W e do this through a s imple hysteresis controller[18]t hat reduces that information if t he time requirement for t he current loop is not likely to be met,and increases it when the probability of satisfying that requirement rises again. The whole probabilistic a pproach(coarse+medium+fine)is aimed t o build web interfaces for robot t eleoperation more a daptable t ha n conventional ones, constituting a framework t hat is a lso compatible with a ny network-improving a pproach.

The s uitability of the hysteresis controller t hat we present in this paper has been stated through two methods. W e have firstly compared it to an aut omatically generated a lgorithm that is supposed to yield the best results with respect to a mathematically defined goodness measure. We have included in that goodness measure both the probability of satisfying the time requirements of the control loop and t he density of sensory information shown

1Sensory data is by far the most bandwidth demanding among the ones that flow through a control loop;u sually, actuation signals are only constituted of a few bytes.t o the user. The framework for constructing the near-optimal algorithm has been reinforcement learning, and in particular Q-learning[19],since this methodology yields policies (=algori t hms) that tend to the optimal ones. We have found that the Q-based algorithm performs similarly t o our controller, thus providing a satisfactory justification for the use of the latter. Secondly, we have implemented our algorithm in a real web-interfaced robot and have found that the behavior of the system is better when the a utomatic flow regulation is activated.

The paper is organized as follows: sectio n II describ e s a general web interfaced robotic s ystem a nd the probabilistic models used for its time consuming parts.Section III e xplains our a lgorithm information flow regulation a nd how it has been compared to a near-optimal algorithm generated through Q-learning. Section IV shows the results of e valuating t he proposed controller in a real web-interfac e d robot.The paper ends with some conclusions a nd future work.

II.C ONTROL L OOP M ODELING IN A W EB-I NTERFACED

R OBOTIC S YSTEM

W e have mode ll ed a web interfaced robotic s ystem as s hown in fig. 1.A ll the control loops that exist in the s ystem a re assumed to fit in t o the following scheme: user’s a ctuation (on a given actuation widget2)generates some s ervice requests that are transmitted through the network to t he suitable modules of the software architecture of the robot; once the requests are completed, their r e turn data plus the readings from the sensors associated to the sensory widgets (obtained from other services of the robot), are s e nt back to the display.F or portability reasons, the client-s ide application is assumed to be a Java Applet[20], while t he robot software architecture is assumed to be implemented upon the CORBA middleware[21].

I n the described model there are several time-consuming classes of components(please refer to fig. 1):

1) Processing Components.These components (Translation, CORBA processing, Service Processing, and Display Processing) involve the processing of some data to yield another. The time consumption of these operations depends basically on: the size of the data, the c omputational complexity of the processing algorithms, a nd the CPU scheduling provided by the operating system (multitasking assumed).The first two sources of time c omplexity a re well a pproximated by polynomial functions, since these algorithms are O(n k), with k typically being 1 or 2.This has been modelled by uniform probability distributions in order to cope with slight variations due to conditional statements in the code or imprecisions in the measurement of time.In non-real-t ime e nvironments, sporadic high time consumptions can appear due to the thir d source. We have modelled this by adding e xponential probability distributions when needed.

2A widget is a component in a graphical interface(buttons, panels, et c).

Fig. 1.Proposed g eneral scheme for a c ontrol loop in a web interfaced robotic system. All the time-c onsumptio n steps are indicated, assuming a CORBA middleware for the robot software architecture.

2) User Reaction Components.H uman reaction-t ime depends on several factors, as varied as: amount of information interpretable by the user, spatial arrangement of that information[22], rate of change in sensory data, etc. The majority of models for human reaction time in t he l iterature are based con ex-Gaussians [23],which are the c onvolution of a Gaussian and a n exponential distribution.

F or the sake of simplicity, in our work we model human reaction-t ime as a Gaussian probability distribution.

3) Network Transmission Components.This c lass includes components of t he physical network, queuing buffers, and the OS I protocol processes. In t he l iterature it has been stated that the arrival time of ethernet c ommunications tends to a Poisson process as long as the network is fast enough, thus the interarrival time can be modelled as an exponential distribution[24]. Experimentally, it has been shown that i n cases where the network is slower it is more appropriate a beta distribution [25].

III.F LOW R EGULATION OF S ENSORY I NFORMATION

The regulation action considered i n this paper is based on reducing/increasing the amount of sensory information t ransmitted to the web interface (for example, the number of pixels of a camera image or the number of samples of a l aser range scanner). It is intuitive that degrading the a mou nt of sensory information provided to the user should degrade the overall performance of the system gracefully (that is, the capability of controlling the system by the user s hould be maintained), although the control should be better when more data is ava ilable. Fig. 2illustrates this intuition through the results of some experi e nces we have c onducted.In such experiments, people control remotely t he movement of a simulated mobile robot for following a c ircular corridor along its middle line. T he only sen s or a vailable is a laser scanner that provides range measurements. T he shape of the corridor has been chosen t o force the user to actuate contin u ously. The figures showhow the average deviation from the desired path increases when less sensory information is available, and t hen the u ser reaction time is smaller since the user must react faster t o unexpected situations (when more sensory data is a vailable, the user spends more time planning better a ctions). In summary, our experiments show that an a cceptabl e control is still possible when sensory data is reduced.

Fig. 2. Up) Simulated environment where the user drive s remotely a mobile robot receiving only data from a laser scanner. Bottom-Left) Average d eviation from the desired path under different densities of s ensory information.Bottom-Right)Time between user commands (speed/orientation changes) for the same densities.The web interface used is the same as in experiments in section IV. The simulated robot runs in a remote computer. Communication s are via ethernet twisted-pair.

Next w e describe a hysteresis controller that allow s the interfaced system to regulate automatically the amount of s ensory information that is show n to the user (and t herefore, the information flows through the system) in o rder to a dapt to varying and u npredictable time c onditions. Also, we present a method for c onstructing n ear-o ptimal regulation a lgorithms with respe c t to a given goodness measure,a nd this is used to justify the s uitability o f t he former.

I n both t he hyst e resis c ontroller a nd t he n ear-o ptimal

a lgorithm,we assume that the client-s ide interface of the s ystem displays a finite set of sensory widgets associated to t he current control loop, let say W={w i}.E a ch widget w i has a finite set of possible density st a tes d(w i)={d ij},with e ach density state d ij indicating a given amount of data a(d ij) t hat the widget shows t o the user when it is in that s tate.

A.Hysteresis Controller Algorithm

A pseudocode for the hysteresis controller a ppears in fig. 3(experimental results of its implementation are given in s ection I V).T he algorithm reduces gradually the amount o f information associated to the sensory widgets until: a) t he probability of the loop to satisfy its time requirements falls under a given “critical thresho l d”(then medium or c oarse adaptations must be done),or b) th a t probability rises over a given “s afe t y threshold”(the n the loop is s atisfying its time requirements comfortably).When b) o ccurs, the sensory widgets that d id not show all the inf o rmation th a t they could,recover their densities gradually.W hen the probability lie s between both t hresholds, the sensory widgets reduce their densities in an o rderly fashion. T he critical and s afety t hresholds and the l oop time requirement must be specified by the user or the programmer.

O <- list of sensory widgets ordered by decreasing density

O’ <- equal to O, but in increasing order of density

I <- 1

P <- probability of satisfying time requirement of the loop

If (P < critical threshold)

Do Medium and Coarse adaptations.

Else if (P < safety threshold)

If (widget O(I) density can be decreased)

Set current density state of O(I) to its next lower state. Else

If (I < number of widgets)

I <- I+1 /* next widget */

Endif

Else /* P >= safety threshold */

If (widget O’(I) can increase its density)

Set density state of O’(I) to the next higher state.

Else

If (I < number of widgets)

I <- I+1 /* next widget */

Endif

Enddo

Fig. 3.Pseudocode of the hysteresis controller that regulates the amount of flow information in a control loop of the web interfaced system.

B.Near-Optimal Algorithm

W e propose n ow a method for c onstructing n ear-o ptimal algorithms for s ensory flow regulation a utomatically.T h is method is t oo slow to use it at run-t ime, but our goal is rather to calculate, by comparison, the o ptimality of the algorithm presented in the previous s ection.

W e have chosen reinforcement learning (RL)[19]as t he framework for t he near-o ptimal algorithm.R L can be u sed for learn ing the optimal policy (=sequence of actions) t o perform by an agent in a complex scenario.At any moment of the RL process,t he agent(the web interfaced s yste m in our case)is in a state s(set of widget densities a nd current probability of satisfying the control loop time requirement) and decides to e xecute some action a (changing the density of some widget), t urning its state into s tate s’and getting a reinf o rcement signal or reward r for its decision (which sets the o ptimality measure o f its behaviour).The s e experience tuples (s,a,s’,r) are used for finding a policy πthat maximizes t he l ong-t erm reward.I t is straightforward to interpret t he policy a s a n a l gorithm, as we will do here.

T here a re several methods for solving RL problems. W e have selected Q-l earning since it doe s not need the probabilistic model of the a gent’s e nvironment.I n spite of t hat,Q-l earning yields policies that tend to the optimal o n e s.I t u ses the fo ll owing value function t o resume the l earning procedure,which can be recursiv e ly computed:

))

)

(

max

(

),(

),(a s

a s

−

′

α(1)w hereαis the learning rate(it must decrease slowly3), γ is t he discount factor(which represents the importance o f future rewards),and Q(s’,a’)refers t o the Q-values of the n ext state s’for any action a’.T he final Q i s t he best policy u nder the agent experience, and therefore, the best known a lgorithm to follow when the agent is confronted with the s ame scenario a gain. The Q function is a matrix of (number of states x number of actions). For a given state, the action for which Q is maxim a l i n that state is the best decision a ccording to the policy. T herefore t he o btained Q c an be u sed as an algorithm by running the procedure shown in fig. 4.

Q <- near-optimal values produced by Q-learning

If (P < critical threshold)

Do Medium or Coarse adaptations.

Else

S <- current state of the system

A <- A’ for which Q(S,A’) is maximum

(if several maxima, at random)

Do A (change one of the widgets densities)

Endif

Enddo

Fig. 4.Pseudocode that interprets a learned Q as an algorithm for regulation of information flow.

C.Optimality of the Intuitive Algorithm

Now we c ompare the hysteresys co n troller to the Q-l earning a lgorithm.The goal is to provide a scientific justification for the optimality of the former.

W e have based the experiments on the real interface for c ontrolling an assistant robot that is described in section I V. The network is a mixture of twisted-pair 100 Mbps s egments a n d a wireless 802.11g segment. However, f o r c arrying out a sufficiently large number of learning steps a nd rich comparisons (which would not be possible using t he real application), we have employed the probabi l istic models described in section II for simulating the whole i nterfaced system. We have gathered time measurements of t he different components of the real interfaced system and e ntered them into those models (see table 1), thus obtaining very realistic simulations.

T ABLE1

P ROBABILISTIC MODELIN G OF SOME OF THE COM PONENTS OF A WEB-INTERFACED ROBOTIC S YSTEM,CONSIDERING PROCESS ING OF 1BYTE OF

DATA

Component Model Parameters User reaction Gaussianμ=100, σ=50 Network transmission Exponentialλ=268

ORB processing Uniform+Exp.a=-0.7, b=0.7, λ=0.5

T he interfaced system has a c ontrol loop which allows t he teleoperator to drive the robot by sending direct motion c ommands (speed/direction).The l oo p has one sensory w idget that displays the readings of the robot laser, which

3We have chosen α(i)=1/i c, where i is the c urrent iteration index of the

Q-learning algorithm and c is a constant that we have calculated for α to e qual 0.05 in the last iteration. This function is demonstrated to produce a good convergence rate[27].has a range of 180º,and a low-resolution image captured by a camera mounted on the top of the robot.

W e have set the l aser widget densities to four possible values: widget off, a n d displaying 90, 180, and 360 range points.We have used t hree densities for the camera widget: c amera OFF, camera in black & white, and camera in c olour. F ig. 5shows the cumulative probability distribution function of time consumption i n the control loo p w ith the c amera widget set permanently to each of its densities. We have included a time overload i n the system (for simulating u npredictable delays) modelled by a G a ussian probability distribution function with μ=600 ms and σ=200 ms.Notice t hat the eff e ct of the camera is evident. In fact, changes in t he laser widget density are not shown in the figure since t hey are very close to the camera widget main curves.

c onsumption of the contro l loop for different camera densities. The horizontal axis indicates a time requirement for the control loop, while the vertical axis gives, for that requirement, the probability of c losing the loop in an equal or shorter time.

F or this setting, we have d i scretized the state space of t he system i nto 48 states that are the combination of 4 probabilit y ranges of satisfying the control loop time requirement,of 4 laser densities, and of 3 camera densities. W e have established 7 possible actions (to set one of the w idgets to one of its densities). If we define p(s)a s the i nteger discretization of the probability of closing the loop u nder the tim e requirement (1-> 0-50%, 2-> 51-75%, 3-> 76-85%, 4-> 86-100%), d(s)a s the density of the laser w idget (1, 2, 3, or 4), a nd c(s)a s the camera w idget density (1, 2, or3), the reward o btained from a given selection of w idget densities and probability to satisfy loop requirements can be calculated as4:

()

⎪⎩

⎪

⎨

⎧

⎟

⎠

⎞

⎜

⎝

⎛−

−

∧

∨

=otherwise

)(

3.0

)(

1.0

)(

6.0

)(

if0

)(

F or learning the Q matrix w e have e xecuted re peatedly e quation (1) with α(1)=1 and γ=0.9,during 10000 i terations w ith a desired time requirement for the loop of 1450 ms. Table 2shows t he por t ion of the learned Q that

4Generally, in Q-learning the reward is a function of both the current s tate and the selected action, which is useful in the case that carrying out actions has some cost. In this paper, the reward only depends on the c urren t state since we do not distinguish among the costs of carrying out different actions (we consider them all to be nu ll).c ontains the most visite

d states (thos

e t hat have been

s ufficiently explored during learning).

T ABLE 2

LEARNED Q(ONLY SUFFICIENTLY EXPLO RED STATES)

State Best

Action

according to Q

6 (prob<51%, laser 90, cam. Color)Camera B&W

9 (prob<51%, laser 180, cam. Color)Camera B&W

12 (prob<51%, laser 360, cam. Color)Camera OFF

15 (prob 51-75%, laser OFF, cam. Color)Laser 90

18 (prob 51-75%, laser 90, cam. Color)Camera B&W

21 (prob 51-75%, laser 180, cam. Color)Camera OFF

35 (prob 76-85%, laser 360, cam. Color)Laser 360

37 (prob>85%, l aser OFF, cam.OFF)Laser 90

38 (prob>85%, l aser OFF, cam.B&W)Laser 360

40 (pr ob>85%, l aser 90, cam.OFF)Laser 180

41 (prob>85%, l aser 90, cam.B&W)Laser 360

43 (prob>85%, l aser 180, cam.OFF)Camera B&W

44 (prob>85%, l aser 180, cam.B&W)Laser 90

46 (prob>85%, l aser 360, cam.OFF)Camera B&W

47 (prob>85%, l aser 360, cam.B&W)Laser 360

the hysteresis controller(dotted line) for l e ft) a web-interfaced system c omposed of two sensory widgets, and r ight) the same system with only one sensory widget.

W e have then c ompared both algorithms (the Q-based o ne vs. the hysteresis controller), measuring their respective cumulative rewards over time (= their o ptimalities). For t hat, we have launched both for 100

i terations, each one being the closing of the contro l loop for

40 times (which

supposes a bout10 t o40 seconds of real t ime execution; summing a total comparison time of a bout 1000-4000seconds). Fig. 6-L eft shows the total reward c ollected. T he average optimalit i es a re 301.7 (Q)and 280.6 (hyst.),with sta n dard deviations of 17.7and 21.2 respectively, which makes them indistinguishable,s howing t hat our intuitive approach is c lose to the o ptimal.Fig. 6-R ight shows a similar result for t he same web-i nterfaced s ystem but with only one widget,t he laser,a nd a time requirement of 250 ms. W e have obtained an a verage reward f o r Q of12.2with a standard deviation of 6.4,a nd o f5.5with a standard deviation of 4.3for the hysteresis c ontroller.

I V.R EAL I MPLEMENTATION

Once we have s how n the near-o ptimality of o ur c ontroller for regulating the information flow of the control l oops, we have implemented it in the real interface o f fig. 7 a nd evaluated it under the control loop with one sensory w idget (the laser). The real robot is a s ervice robot called S ANCHO i ntende d for pick-a nd-delivery, museum guiding, or fair hosting.Fig. 7. On the left, the SANCHO r obot we have used for implementing a r eal web-c ontrolled system. On the right, the client-s ide web interface. The robot is based on a Pioneer 3DX mobile platf orm enhanced with an on-board computer, wi-fi connectivity, and several sensors (laser, sonar, c amera, infrared).

T he experiment has consisted in driving the robot remotely from a given location to another through s i mple s peed/direction commands. T he user reaction time ha s not been c onsidered in this experiment(only the time from the u ser action to the displaying of sensory data), since people do not always do control(for example, they do not act w hen the robot is already in the right direction and speed).

A user waiting for the next control action would increase t he loop time, which would be considered by the system as a problem for satisfying the requirement, which is not n ecessary.

W e have measured t he final cumulative probability distribution functions u nder two situations:i) w hen t he hysteresis controller is activated,and ii) without using regulation flow algorithms (that is, setting the laser density t o a fixed number of s ample points), o btaining the results s hown in fig. 8-Up. This figure is the re s ult of 75passes of t he control loop. Also, we have logged the densities of the l aser widget that the c ontroller has set during the e xperiment (fig. 8-Bottom). As shown in the figure, the c ontroller has a good probability of satisfying the time requireme n t for the control loop (75 ms)similar to that of s etting the density o f the widget to a small number points.

H owever, this good time satisfaction has been achieved w ith laser densities that are, most of the time, very high (360 points).Notice that the la s er widget i s disabled during s hort periods of time when the time conditions were hard.

I f these periods exceed some predetermined duration, the o ther regulation actions (coarse and medium) take place as e xplained in[17].

I n [26]you can find a multimedia file that shows the o verall regulated-flow teleoperation of the robot and a c oarse adaptation action due to critical threshold u nderpassing, in which a second control loop consisting in n avigating reactively to a manually-s et goal location is a ctivated.

Fig. 8.Up) Cumulative probability distributions for the real experiments u nder different sensory information flows. Bottom)Densities of the s ensory widget during the use of the hysteresis contr oller. Each density value is logged after one pass of the control loop.

V.C ONCLUSIONS AND F UTURE W ORK

I n this paper we propose a new way of approaching the t ypical unpredictability and time variation problems of web n etworked robots that is of greater gener a lity than i mproving predictability and time efficiency of network t ransmissions (the most common line of research in this field in literature).

I n particular, we have focused on designing an

a lgorithm that automatically regulates the optimal amount o f inf o rmation that flows through the system. The a lgorithm is a simple hysteresis controller,a nd we have developed a framework for generating automatically near-o ptimal algorithms for the same regulation task, through Q-l earning, in order to evaluate i ts optim a lity.W e have developed several experiments based on a real interfaced robot that show that our algorithm is as o ptimal as the near-o ptimal one generated by Q-l earning. A lso, a real i mplementation showing the suitability of our approach has been described.

I n the future, we plan to implement more web

a pplications for assistant and service robots, in order to t est o ur approach under different conditions.Also, the use of t he Q-l earned algorithm in real-t ime will be explored.

R EFERENCES

[1]Jia S., Hada Y., Ye G., and Takase K.Distributed Telecare Robotic

Systems Using CORBA as a Communication Architecture.IEEE Intl. Conf. on Robotics & Automation, 2002,pp. 1659 –16. [2]Camarinha-Matos L.M., and Afsarmanesh H. Tele-Care and

Collaborative Virtual Communities i n Elderly Care. Int.Workshop on Tele-Care and Collaborative Virtual Communities in Elderly Care, TELECARE 2004,pp, 1-12. Porto, Portugal, April 2004. [3]Thrun S., Beetz M., Bennewitz M., Burgard W., Cremers A.,

Dellaert F., Fox D., Hahnel D., Rosenberg C., Roy N., Schulte J., and Schulz D. Probabilistic algorithms and the interactive museum tour-guide robot Minerva. Journal of Robotics Research,19(11), pp. 972-999,2000.

[4]Goldberg S.and Bekey A. Online Robots an d the Remote Museum

Experience.In Beyond We bcams: An Introduction to Online Robots. MIT Press, 2002,pp. 295 -305.[5]Lixiang Y., Pui T., Quan Z., an d Huosheng H.A Web-based

Telerobotic System for Research and Education at Essex,2001 IEEWASME International Conference on Advanced Intelligent Mechatron ics,8-12 July2001 Como, Italy,pp.284-2.

[6]Safaric R., Sinjur S., Zalik B., and P arkin R.M. Control of Robot

Arm with Virtual En vironment Via the Internet.Proc.of the IEEE, 91(3), March 2003 pp. 422 –429.

[7]Liu P.X., Meng M.Q-H., Gu J., Yang S.X., and Hu C.Control and

Data Transmission for Internet Robotics, IEEE Intl. Conf. on Robotics & Automation, Taipei, Taiwan., 2003, pp. 1659-16. [8]Oboe R. Web-interfaced, force-r eflecting teleoperation systems.

IEEE Trans. On Indusltrial Electronics, 48 (6), 2001, p.p. 1257-1265.

[9]Belousov I.R.,Chellali R.,and Clapworthy G.J. Vir tual reality tools

for Internet R obotics.ICRA2001, May 21-26, Seoul, Korea, p.p.

1878-1883.

[10]Jiacheng T.and Clapworthy G.J. Virtual environments for Internet-

based robots. I.Modeling a dynamic environment.P roc.of the IEEE.Vol.91, Issue 3, March 2003 pp. 383 -388.

[11]Wang X. and Schulzrinne H. Comparison of Adaptive Internet

Multimedia Applications. IEICE Transactions on Communications, Vol. E82-B, No. 6, June 1999.

[12]An dreu D., Fraisse P., Roqueta V. and Zapata R. Internet enhanced

teleoperation: toward a remote supervised delay regulator. 2003 IEEE International Conference on Industrial Technology, Vol. 2, pp. 663-688.

[13]Imer O. C., Yüksel S., and Basar T. Optimal control of LTI systems

o ver unrealiable communication links. Automatica 42 (2006), 1429-1439.

[14]Liu X. P., Meng M. Q.-H.,andYang S. X.Data Communications

for Internet Robots. Autonomous Robots 15, 213-223, 2003. [15]Fiorini P. and Oboe R. Internet-Based Telerobotics: Problems and

App r oaches. International Conference on Advanced Robotics (ICAR’97), pp. 765-770, Monterey, CA, USA, July.

[16]Lin W. K. W., Wong A. K. Y.,and Dillon T. S. Application of Soft

Computing Techniques to Adaptive User Buffer Overflow Control o n the Internet. IEEE Tra nsactions on Systems, Man and Cybernetics-P art C: Applications and Reviews, Vol. 36, No. 3, May 2006.

[17]Fernández-Madrigal J.A.,Cruz-Martín E.,Cruz-Martín A.,

González J.,and Galindo C.Adaptable Web Interfaces for Networked Robots.I ROS'2005, Edmonton (Canada), 2-6 August 2005.

[18]C.H. Phillips and R.D. Harbor, Feedback Control Systems,

P rentice Hall, 2000.

[19]Kaelbling L.P., Littman M.L., and Moore A.W. Reinforcement

Learning: A Survey. Journal of Artificial Intelligence Research 4 (1996) 237–277.

[20]Schildt H. J AVA 2:The Complete Reference, McGraw-Hill, 2002.

[21]Henning M. and Vinovski S.Advanced CORBA Programming with

C++, Addison-Wesley, 1999.

[22]Fitts P. M.The information capacity of the human motor system in

c ontrolling the amplitude of movement,J ournal of Exp e rimental

P sychology vol. 47, pp. 381-391, 1954.

[23]Zaitsev A. V. and Skorik Y. A. Mathematical Description of

Sensorimotor Reaction Time Distribution Human Physiology, Vol.

28, No. 4, 2002, pp. 494-496.

[24]Cao J., Cleveland W.S., Lin D., and Sun D.X.I nternet Traf fic

Tends Toward Poisson and Independent as the Load Increase s, in NonLinear Estimation and Classification, Holmes et al. eds., Springer, New York, 2002.

[25]Kobayashi H.Modeling And Analysis. An Introduction to System

P erformance Evaluation Methodology, Addison-Wesley, 1978.

[26]http://www.isa.uma.es/personal/jafma/e xperiment s.htm

[27]Even-Dar E.a nd Mansour Y.Learning Rates for Q-l earning,

J ournal of Machine Learning Research, vol 5, pp. 1-25, 2003.