点击下载
本文文档

当前位置：首页 - 正文

Failure Semantics of Mobile Agent Systems Involved

来源：动视网责编：小OO 时间：2025-09-28 13:49:54

Failure Semantics of Mobile Agent Systems Involved

FailureSemanticsofMobileAgentSystemsInvolvedinNetworkFaultManagementOttoWittner,CarstenJ.E.Hölper,BjarneE.HelvikDepartmentofTelematics,NTNU.E-mail:Otto.Wittner@item.ntnu.noAbstractRecentlymobileagenttechnologyhasbeenrecognisedasapotentialtoolforre-a

推荐度：

点击下载本文 文档为doc格式

导读FailureSemanticsofMobileAgentSystemsInvolvedinNetworkFaultManagementOttoWittner,CarstenJ.E.Hölper,BjarneE.HelvikDepartmentofTelematics,NTNU.E-mail:Otto.Wittner@item.ntnu.noAbstractRecentlymobileagenttechnologyhasbeenrecognisedasapotentialtoolforre-a

Failure Semantics of Mobile Agent Systems Involved in Network

Fault Management

Otto Wittner,Carsten J.E.Hölper,Bjarne E.Helvik

Department of Telematics,NTNU.

E-mail:Otto.Wittner@item.ntnu.no

Abstract

Recently mobile agent technology has been recognised as a potential tool for re-alising distributed network fault management.The autonomy and mobility of such

agents can help ensuring robustness of the management system.A mobile agent

is dependent on a suitable environment consisting of a set of mobile agent systems

(MAS)of compatible system types.In this paper we examine what failure semantics

are desirable for services provided by a MAS when assuming the MAS to be part of a

network fault management system.Based on a general failure class model we deﬁne

a new failure subclass.The subclass identiﬁes the usefulness of certain response fail-

ure semantics in MAS-based systems in contrast to traditional client-server systems.

At the end of the paper results from an evaluation project examining state of the art

MASes are presented.

Keywords:Mobile agent system,network management,fault management,failure

semantics.

1Introduction

Traditionally network fault management(NFM),and network management in general, has been implemented based on static architectures[1,7]with high degree of centralisa-tion.Fault information is collected from different network elements(NEs)and funneled towards one or a few central NFM units whereﬁltering,correlation and other forms of analysis are performed.Due to the increasing number of NEs being able to report alarm messages electronically,vast amounts of failure information must be transported to the central units.Thus an extra load is put upon the network used by the NFM system which is often a subnet of the network being managed.In some cases this extra load may cause positive feedback and aggravate the failure situation being reported.By distributing the NFM system throughout the network better load balancing is possible.

Centralised architectures are vulnerable to failures if no extra concern is taken.The fact that the whole system depends upon a few nodes in the architecture makes it a candi-date for catastrophes.A single node crash can cause a full system breakdown.For a NFM system this is not desirable.Robustness to faults is essential if faults are to be handled properly.By emerging to a more distribute architecture dependability can be improved.

Recently mobile agent technology[3,2]has been recognised as a potential tool for implementing distributed NFM system[10,2,18].In a distribute architecture the NFMNE

Figure1:NFM using mobile agents

system is divided into smaller units capable of performing work while located different places in the network environment.Autonomy is essential for the small units to ensure robustness of the total system.Being able to migrate between NEs can be important to cope with the dynamics of the network environment.Both autonomy and mobility are fundamental attributes of mobile agents.

For existence,a mobile agent is dependent on a suitable environment.We choose to view such an environment as a collection of mobile agent systems(MAS)of compatible system types.Our view conforms with OMGs MASIF standard[20,9].Figure1illus-trates a network environment where a NFM system is managing NEs using mobile agents.

A MAS can be seen as a service provider for mobile agents visiting or wanting to visit the NE where the MAS resides.Table1shows a list of typical services.Our list has emerged from the idea of a MAS-based NFM system,but resembles work done by other researcher,i.e.[4]proposes a list of facilities that should be supported by a persis-tent MAS,and[20]summarises the set of necessary agent system functions the MASIF standard addresses.

Our objective in this paper is to examine what failure semantics are desirable for the services provided by a MAS when assuming the MAS to be part of a NFM system.We also test how state of the art MASes available today conform with our proposed semantics.

The rest of the paper is organised as follows.Section2reviews failure classes and introduces our new subclass,section3examine the failure semantics of a MAS from a service viewpoint,and section4presents results from an evaluation of selected MAS implementations.In section5we conclude and indicate future research tasks.Category Service

Receives/transmits and(un)marshals

agents

Executes agent instructions

Communication Encodes/decodes and sends/receives mes-

sages

“Provides address and name information

“Provides shared information areas

Persistence Stores snapshots of agent state information

“Enables grouping of agent actions into

atomic transactions

Security Provides mechanisms for secure authenti-

cation and access control for MASes and

agents

“Provides misc.encryption tools for ensur-

ing information security

Other Provides multipurpose database service

“Provides access to management functions

for MAP and NE

Table1:A summary of possible services provided by a mobile agent system,categorised and grouped appropriately.

2Failure Classes

The behavior a server exhibits when it is not able to respond properly(as given by spec-iﬁcations)to a service request,is of great importance seen from a fault management per-spective.Such a behavior is by deﬁnition a failure[14]and can be classiﬁed into three main classes:Timing,Response and Omission failures[6].

A timing failure occurs when the response is correct but untimely,a response failure occurs when the response is timely but incorrect and an omission failure occurs when no response is given to a request at all.In principle omission failures are a subclass of timing failures and/or responses failures,i.e.inﬁnitely delayed response or a blank,undetectable value returned.Table2lists the main classes and some subclasses taken from[6].

When a MAS is the service provider and the application domain is NFM systems,we argue that one additional failure class is particularly relevant.

A bounce failure occurs when a server forwards an object to a different location than

initially speciﬁed.Selection of new location can be random or follow some given strategy.

The new class is not distinctive compared to the classes described above,but rather a subclass of response failures.E.g.if the object is an agent and the agent has requested migration to a certain MAS but is bounced off to a different MAS,the agent experiences a response failure(wrong location)to its migration request.Failure Class Description

Omission No response returned

Crash

Timing Response returned untimely Early timing

“Response to late

-

“Incorrect value returned

State transition failureRecent work on applying principles of collective behavior to problems of the network management domain has shown promising results[5,19,17,18].The idea is based on using a high number of small and simple agents.By letting each of these agents move around and perform simple operations,a powerful collective behavior emerges which again makes the group of agents capable of performing complex tasks.

We argue that in a NFM system based on simple mobile agents and principles of collective behavior our new failure class,bounce failures,shows its signiﬁcance.In the following two sections we look at some of the MAS services from table1and describe why bounce failure semantics for these services can be advantageous.

3.2.1Basic Services and Bounce Failure Semantics

If one of the basic services fails in a MAS,migration or execution,the agent in question will normally have severe difﬁculties continuing its mission.Especially if the agent is simple and lacks smart handlers for emergency situations.Self termination can quickly be the only option.

A simple case is when no contact with the destination MAS can be established during a migration operation.Omission failure semantics may seem desirable since it enables the agent to reselected its destination and re-run the migration operation.But a simple agent might not have knowledge enough to select an alternative destination.If the option is self termination,bounce failure semantics can give such an agent a chance to continue its mission.Assuming that the agent is bounced to a random location,it would detect its new incorrect location and maybe retry migration to the desired destination.

Another case is when an agent reaches its destination in an apparently successful mi-gration operation,but fails to execute due to incompatibilities or limitations in the destina-tion MAS.Omission failure semantics for the execution service would imply termination of the agent.If migration and execution are grouped into one transaction[15]the fault would appear as a migration omission failure.In both cases bounce failure semantics would give the agent an opportunity to continue its mission provided that it eventually arrives at a MAS where it is able to execute successfully.

From a NFM system perspective there are several reasons why effort should be put into helping agents escape faulty MASes.

If a certain faulty MAS just“swallows”agents,i.e.the execution service have omission semantics,the NFM system will have little difﬁculties detecting the fault but will have difﬁculties locating the faulty MAS.None of the agents experiencing the fault will be alive and able to report.

Failures which cause partitioning of a network are difﬁcult to manage.A faulty NE acting as a gateway between two network segments can typically cause such partitioning(ﬁgure1).If the NE provides MAS services(in addition to gateway services)execution of agents visiting the NE may fail due to the faulty state of the NE.Assuming our mobile agent based NFM system lacks an agent factory facility in one of the segments,distributing agents out into both segments will be impossible if the MAS services of the NE have omission failure semantics.Getting one or a few agents across the border could enable the NFM system to establish an alternative path between the two network segments and initiated rerouting of trafﬁc.3.2.2Enhancing Services and Bounce Failure Semantics

A mobile agent will often be able to proceed with its work even if one of the enhanc-ing services fail to respond properly,e.g.the agent can migrate to the next MAS on its itinerary or to its home MAS for error-reporting.Thus omission failure semantics will be desirable for these services in most cases.

Several of the enhancing services provide what could be called a pure information service,i.e.the user requests information and the service responds with an information package.Further,the information in question is often location independent,i.e.the in-formation requested does not have a direct relation to where the information source is situated.Examples of such services are the directory,authentication and database ser-vice.Some of the primitives within a checkpoint service,e.g.“fetch snapshot”,and within a management service,e.g.“get load of network segment”,can also be classiﬁed as information requests with some degree of location independence.

Requests for location independent information include a destination address,but can be redirected to a different address and still result in a correct response.Often a redi-rection will be transparent to the requester,except for additional delays(e.g.a chain of standby/restoration servers[13]).Redirection overhead can be avoided by informing the requester of the situation and making him update his initial destination address.The latter kind of none transparent request-redirection scenario can be viewed as a bounce failure, and the service can be claimed to have bounce failure semantics.

3.2.3Disadvantages of Bounce Failure Semantics

Bounce failures belong to the class of response failures which are normally undesirable due to their indeterministic behavior.

Indeterministic bounce failures will occur if a bounce strategy with a stochastic com-ponent is chosen.In such a case a sufﬁcient level of autonomy is required for the agents, enabling them to tolerate unpredictable migration.This can be a challenge if the agents are required to be small and simple.

Agent are executable units,and moving executable units from host to host challenges the security system on the hosts as well as the security measures implemented in the agents.Introducing indeterministic movement does not make things easier.A robust security system is required which again can result in larger and more complex agents.

4Evaluating Implemented MASes

A great number of MASes are available today and many of the popular ones in the public domain.In our evaluation project we selected four freely available MASes.Two are developed by commercial institutions and two by academic institutions.Table3gives an overview of services provided by the MASes with key words indicating functionality for each service.More information can be found in[16,11,8,12].

Services/MAS Aglets1.1Mole3.0

Migration Weak,caching of classes,

static rule set for class load-

ing,both dispatch and re-

tract Weak,classes provided by code server or source loca-tion

Java Virtual Machine Tool Command

Language interpreter

(modiﬁed)

Messaging Synchronous(now),asyn-

chronous(future),class

of message-body for mes-

sages sent between MASes

is require to be present in

both MASes Session oriented,syn-chronous,asynchronous

Common interface

(Namespace)to several

dir-services,V oyagers

default dir-service

conﬁgurable to be

persistent(ﬁle)or

non-persistent

N/A

Blackboard N/A N/A Snapshot,external stor-

age required,reload af-

ter termination provided

Transaction N/A

Two groups:Native with full access,foreign with restricted access.PGP based signing, access control list

Cryptographic N/A N/A N/A N/A

Management API for implementation of

agent and server monitor-

ing tools Resource management through Master Control Process(scheduler).

Table3:Services provided by four selected mobile agent systems(N/A=“not available”).T est Case Service Evaluated Mi Migration

Migration acknowledgement interrupted

Ex Execution

MASﬂooded with agents

Me Messaging(asynchronous) Message acknowledgement interrupted

MeCorr Messaging(asynchronous)

Table4:Test case descriptions.

4.1Test Environment and Results

A group of three interconnected PC all running Linux2.0constituted our test environ-ment.Most of the tests were performed by interrupting or altering the trafﬁc streams ﬂowing between MASes located on different hosts.Table5shows the observed fail-ure semantics of MAS services for the different test cases.Brief descriptions of the test case are given in table4.More complete documentation of test cases and results can be found at the authors web site(http://www.item.ntnu.no/˜ottow).The observed semantics are strongly related to each test case and should only be considered as an indication of expected failure semantics for the relevant services.

None of the MASes evaluated provide mechanisms for negotiation of failure seman-tics.

5Conclusion and Future Work

Research indicate that a network fault management(NFM)system can gain efﬁciency and dependability by using mobile agent technology as a mechanism for distribution. The autonomy and mobility of mobile agents is valuable for managing dynamic network conﬁguration in a robust manner.The robustness will depend strongly on the failure behavior of the mobile agent systems(MASes)which provide the execution environment for the agents.Thus being aware of what failure semantics the MASes have at the service level is important.

In traditional NFM architectures response failure semantics for services are generally not desirable since they result in a need of more complex error handling in clients.But in a mobile agent based NFM system some types of response failures can prove to be valuable. In this paper we have deﬁned a new failure subclass,bounce failures,which captures some of these types,and explained why we consider the subclass to be of importance.We have also evaluated a set of state of the art MASes and observed omission failure semantics to be the most common behavior,but still with response failures semantics exhibited in some cases.None of the observed response failure could be classiﬁed as bounce failures.

For future work there are several topics requiring attention.As our test results in-dicate more MAS development work is required if MAS services are to provide QoS primitives with negotiable service failure semantics.A more thorough look(by means of analysis and simulation)at the advantages and disadvantages gained by introducing our purposed class of failure semantics is also required.And a lot of work still remains onTest case/MAS Aglets1.1Mole3.0

Mi Omission,

exception

thrown

(long default

timeout)Omission, exception thrown

Omission, exception thrown, (long default timeout)Omission, exception thrown

Ex Response

(value)Omission, crash

Omission, exception thrown Omission, exception thrown

Me Omission,

exception

thrown

(long default

timeout)Omission,no exception

No failure for oneway message(no mesg.ack. sent)Omission, exception thrown

MeCorr Omission/

Response,

exception

thrown on

omission Omission,no exception

Table5:Observed service failure semantics for four state of the agent mobile agent sys-tems(N/A=no results available).agent design/development with focus on how agents can manage speciﬁc fault situation by awareness of service failure semantics and by collective behavior. References

[1]ISO/IEC10040(1998-10).Information Technology-Open Sysems Interconnection

-Systems management overview.International Electrotechnical Commission,1998.

[2]T.White A.Bieszczad,B.Pagurek.Mobile Agents for Network Management.IEEE

Communications Surveys,1(1):2–9,Fourth Quarter1998.

[3]V.A.Pham A.Karmouch.Mobile Software Agents:An Overview.IEEE Communi-

cation Magazine,36(7):26–37,July1998.

[4]M.M.Silva A.R.Silva.Insisting on Presistent Mobile Agent Systems.In

G.Goos J.Hartmanis J.Leeuwen,editor,Lecture Notes in Computer Science(1st In-

ternationale Workshop on Mobile Agents MA’97/ISADS’97),volume1219,pages 174–185.Springer-Verlag,1997.

[5]Gianni Di Caro and Marco Dorigo.AntNet:Distributed Stigmergetic Control for

Communications Networks.Journal of Artiﬁcial Intelligence Research,9:317–365, Dec1998.

[6]Flavin Cristian.Understanding Fault-Tolerant Distributed Systems.Commmunica-

tions of the ACM,34(2):56–78,Feb1991.

[7]J.D.Case M.Fedor M.L.Schoffstall C.Davin.RFC1157:Simple Network Man-

agement Protocol(SNMP).IETF,April1990.

[8]Dep.of Computer Science,Dartmouth College.D’Agents.

http://agent.cs.dartmouth.edu/.

[9]IBM Corporation GMD FOKUS.Join Submission:Mobile Agent System Interop-

erability Facilities Speciﬁcation.OMG TC Document,orbos/97-10-05,Nov1997.

[10]German S.Goldszmidt.Distributed Management by Delegation.PhD thesis,

Colombia University,1996.

[11]IBM.Aglets Software Development Kit.http://www.trl.ibm.co.jp/aglets/.

[12]University of Stuttgart IPVR.The Home of the Mole.http://www.informatik.uni-

stuttgart.de/ipvr/vs/projekte/mole.html.

[13]D.Johansen K.Marzullo F.B.Schneider K.Jacobsen.NAP:Practical Fault-Tolerance

for Itinerant Computations.Technical report,Department of Computer Science, University of Tromsø,October1998.

[14]R.E.McDermott R.J.Mikulak M.R.Beauregard.The Basics of FMEA.ISBN0-527-

76320-9,1996.[15]K.Rothermel M.Strasse.A Fault-Tolerant Protocol for Providing the Exactly-Once

Property of Mobile Agents.In Proceedings of the Seventeenth IEEE Symposium on

Reliable Distributed Systems,pages100–108,1998.

[16]Objectspace.V oyager Overview.http://www.objectspace.com/products/vgrOverview.htm.

[17]T.White B.Pagurek Franz Oppacher.Connetion Management using Adaptive Mo-

bile Agents.In Proceedings of1998International Conference on Parallel and Dis-

tributed Processing Techniques and Applications(PDAPTA’98),1998.

[18]T.White A.Bieszczad B.Pagurek.Distributed Fault Location in Networks Using

Mobile Agents.In Proceedings of the3rd International Workshop on Agents in

Telecommunication Applications IATA’98,Paris,France,July1998.

[19]T.White B.Pagurek.Towards Multi-swarm Problem Solving in Networks.In Pro-

ceedings of the3rd International Conference on Multi-agent Systems(ICMAS’98),

July1998.

[20]D.Milojicic M.Breugst I.Busse J.Campbell S.Covaci B.Friedman K.Kosaka

D.Lange K.Ono M.Oshima C.Tham S.Virdhagriswaran and J.White.MASIF-The

OMG Mobile Agent System Interoperability Facility.In Personal Technologies,

pages2:117–129.Springer Verlag,1998.

Failure Semantics of Mobile Agent Systems Involved

FailureSemanticsofMobileAgentSystemsInvolvedinNetworkFaultManagementOttoWittner,CarstenJ.E.Hölper,BjarneE.HelvikDepartmentofTelematics,NTNU.E-mail:Otto.Wittner@item.ntnu.noAbstractRecentlymobileagenttechnologyhasbeenrecognisedasapotentialtoolforre-a

推荐度：

点击下载本文 文档为doc格式

热门焦点

Failure Semantics of Mobile Agent Systems Involved

Failure Semantics of Mobile Agent Systems Involved

Failure Semantics of Mobile Agent Systems Involved

最新推荐

猜你喜欢

热门推荐