This talk will describe work done in the EU DataGrid project, but which is equally applicable in the context of the UK Grid environment, in defining an initial architecture for network monitoring in the Grid.
Measuring and monitoring of network performance is required by the Grid for two important reasons. Firstly to provide the metrics required for use by Grid resource broker services and the middleware, and secondly, to describe the network performance from the perspective of a Grid application and hence identify any strategic issues which may arise.
The aim of the work was to make use of well-understood basic network monitoring tools in a coherent and simply extensible manner to demonstrate the publication of network metrics to the Grid middleware via LDAP, and also to make this information available via a Web interface.
The talk will outline the requirements for network monitoring and the classical metrics associated with network monitoring and existing monitoring methods and associated tools will be discussed. The architectural design of the monitoring system comprising the four functional units of monitoring tools (aka sensors); the repository for collected data; the means of analysis of that data to generate network metrics; and the means to publish and to use the derived metrics, will be presented.
The system delivered as a part of the EU DataGrid project makes use of basic monitoring tools which produce the standard measurements of Round Trip Delay, Packet Loss, Total traffic volume, TCP and UDP throughput, site connectivity and service availability. Examples of these capabilities will be shown.
In respect of network information services and the publication of network metrics, an LDAP information provider has been developed. Grid applications are able to access network monitoring metrics via LDAP services according to a well-defined LDAP schema. The LDAP service itself gathers and maintains the network monitoring metric data via information provider scripts that fetch the current metric information from the local network monitoring data store.
Independently, a set of network monitoring tools are run from the site to collect the data which describes the local view of network access to other sites in the Grid. The separate components that are required by this network monitoring architecture have been developed and demonstrated and will be shown.
In the future, work will concentrate on the subjects of forecasting with respect to the identified network monitoring metrics; extending the monitoring capabilities and understand how to incorporate them into the network monitoring architecture; reviewing the actual use of these metrics via the middleware. In addition consideration will be given to the use of other approaches to data publication.