AutoPart: Simulation Partitioning Tool For PDNS

Status and Changes: AutoPart 1.2 updated on 3/12/2005

- Fixed a problem that would seg-fault if two connected agents do not have traffic attached.

Previous Changes:

AutoPart 1.1 updated on 9/28/2004

- Fixed a bug that would assign wrong IP addresses in some cases.

Author

Donghua Xu
College of Computing
Georgia Institute of Technology
Email: xu@cc.gatech.edu

Introduction

AutoPart is a tool to automatically partition a large network simulation into smaller simulation instances, so that they can be run on a number of machines with PDNS, a parallel and distributed network simulation tool based on ns2. By distributing a simulation onto a number of machines, we can achieve the simulation scale that the original ns2 cannot achieve on a single workstation.

If you have a network topology beforehand and you want to do simulation with this topology using PDNS, it would be a very tedious and error-prone task to partition the simulation manually, replacing some links with rlinks, adding all those add-route statements, etc.. When the topology is relatively large (say, >1,000 nodes), then it is virtually impossible to partition by hand. In addition, even if you can partition by hand, you do not know for sure how well the performance of the partitioned simulation will be, since this performance depends on a complicated combination of lookahead, load balancing and communication overhead. Therefore we developed this tool to help you automate the partitioning process. This tool takes a ns2 script and creates a number of pdns scripts that are ready to run in parallel on a number of machines, attempting to make the best trade-off between lookahead, load balancing and communication overhead in the partitioning process, resulting in the best performance when being run by PDNS.

Prerequisites

This simulation partitioning tool requires the graph partitioning package METIS to be installed beforehand. After installing METIS, make sure the metis/pmetis/kmetis programs are in your path.
The PDNS you are using must be version 2.27v1b or above for you to enjoy this tool. This is because this tool generates PDNS scripts that make use of an expanded "add-route" syntax that only works for PDNS2.27v1b, which is not officially released yet but can be downloaded here. The installation of pdns2.27v1b is similar to the "Building ns-2 and PDNS" section of pdns2.27v1a.

Download and Installation

Download the C++ source code autopart.cc.
You can use g++ 2.95.2 or above to compile this code, such as:
g++ -o autopart autopart.cc

Assumptions

The input ns2 script takes a nam-editor -like format. More specifically, in this format, the node, agent and traffic objects must be defined in the form node(n), agent(n), traffic_source(n), where n is a non-negative integer number. A simple example script of a topology with 6 nodes. illustrated in the following figure, can be found here. If your original script is not in this format, it is generally not difficult to write a perl or tcl script to convert it into this format. Some larger scales of ns2 simulations in this format can also be downloaded here: 538 nodes, 3,886 nodes, 21,424 nodes, 123,536 nodes, and 417,200 nodes.

The partitioning tool assumes two types of nodes in a simulation topology: endhost and router. Endhosts are the nodes that connect to only one node (i.e., its router) and carry the agents that send or receive application traffic. Routers are the nodes that connect to two or more nodes, and only foward the traffic from other nodes. In the above figure, nodes 1 and 2 are routers, while nodes 3, 4, 5 and 6 are endhosts. Making this distinction is very important. You must attach agents and applications to endhosts. If you attached an agent and application to a router such as node 1 or 2 in the above figure, our tool could produce unpredictable (and maybe un-runnable) pdns scripts.
This tool requires a pre-determined set of relations between the three parallel simulation performance factors(look ahead, load balancing and communication overhead) and the simulation model as well as the computer cluster hardware configuration being used, in order to assign reasonable edge and vertex weights to the links and nodes of the topology, so that it can produce a partitioned simulation that runs the fastest. This set of relations should be obtained through a series of benchmark experiments (see References for details). For now this tool hardcodes the set of relations that we obtained through benchmark experiments on our Ferrari cluster computer which consists of eight machines connected via a Gigabit LAN, each machine having two 3GHz CPU's, sharing 2G memory, and should be applicable to similar homogeneous platforms directly.

Usage

autopart -n num[.num] [-W routefile] [-R routefile] ns2script

The ns2script is the name of the original ns2 script that you want to partition.
The -n option is to specify how many parts you want to partition the original script into. You can specificy it to be either one level or two level parts. If you have a computer cluster with x machines, each machine having y CPU's, and suppose z=x*y, then you can either specify -n z or specify -n x.y, and in most cases the latter would generate PDNS scripts with a better performance, since it takes into account the discrepancy between the communication overhead over shared-memory and over LAN.
The -W and -R option is to ask autopart to write/read the calculated routes into/from a route file. Route calculation might take up quite some time in the partitioning process, especially for large and complex network topologies with a large number of traffic streams. With this -W option, the first time you partition a simulation you can store the calculated routes in a file, and next time you do the partition the same simulation(say, partitioning into a different number of parts), you can use the -R option to ask autopart to read the calculated routes and skip the route calculation completely.

Usage Examples

To partition 6nodes.tcl into 2 parts to be run on two different machines:

autopart -n2 6nodes.tcl

To partition 6nodes.tcl into 2 parts to be run by two different CPU's on one machine:

autopart -n1.2 6nodes.tcl

To partition medium.tcl into 8 parts to be run on 4 different machines, each machine having 2 CPU's:

autopart -n4.2 medium.tcl

To partition m32.tcl into 12 parts to be run on 6 different machines, each machine having 2 CPU's, and wite the routes into rfile.txt:

autopart -n6.2 -Wrfile.txt m32.tcl

To partition m32.tcl into 16 parts to be run on 8 different machines, each machine having 2 CPU's, but read the routes from rfile.txt instead of recalculating the routes:

autopart -n8.2 -Rrfile.txt m32.tcl

Running the Partitioned Simulation

You can also download runsim.pl, a simple Perl script that helps you run the partitioned simulation with pdns on a computer cluster. The original runsim.pl runs well on our ferrari cluster which consists of 8 machines sharing the same file system, each machine having 2 CPU's. The machines are named ferrari001-ferrari008, as you can see in the runsim.pl. You can modify the ferrari part of the script to change the machine names to what your cluster machines are named. Suppose pdns is in your path, after you partition the m32.tcl into 16 parts as in the last example above, you can run the partitioned simulation as follows:

runsim.pl -m8 -c2 -nm32

Performance of Autopart Itself

We ran autopart on a 3GHz P4 machine with 2G memory to partition the 417,200-node 400,000-stream ns2 simulation into 8*2 parts. It took 12 minutes to finish if autopart had to calculate all routes, and just 5 minutes if autopart was allowed to read previously-calculated routes. The memory it occupied did not top 700 MB.

References

The following paper is not about how we developed this tool, but about a systematic methodology of partitioning network simulations, mainly obtaining the performance/factor relations by empirical benchmark experiments. The tool as it is now uses the relations that we obtained on our cluster.

Xu, Donghua and Ammar, Mostafa H.. "BencHMAP: Benchmark-Based, Hardware and Model Aware Partitioning for Parallel and Distributed Network Simulation." To appear in Proceedings of MASCOTS 2004.

Acknowledgement

This work is supported in part by NSF under contracts number ANI-9977544 and ANI-0136936.

Contact

Any questions or comments, feel free to contact Donghua Xu at xu@cc.gatech.edu.

You are number to access this page since Sep 8, 2004 .