AutoPart: Simulation Partitioning Tool For PDNS
Status and Changes:
AutoPart 1.2 updated on 3/12/2005
|
- Fixed a problem that would seg-fault if two connected agents
do not have traffic attached.
|
Previous Changes:
|
AutoPart 1.1 updated on 9/28/2004
|
- Fixed a bug that would assign wrong IP addresses in some cases.
|
Author
Donghua Xu
College of Computing
Georgia Institute of Technology
Email: xu@cc.gatech.edu
Introduction
AutoPart is a tool to automatically partition a large network
simulation into smaller simulation instances,
so that they can be run on a number of machines with
PDNS,
a parallel and distributed network simulation tool based on ns2.
By distributing
a simulation onto a number of machines, we can achieve the simulation scale
that the original ns2 cannot achieve on a single workstation.
If you have a network topology beforehand and you want to do simulation
with this topology using PDNS, it would be a very tedious and error-prone
task to partition the simulation manually, replacing some links with rlinks,
adding all those add-route statements, etc.. When the topology is relatively
large (say, >1,000 nodes), then it is virtually impossible to partition
by hand. In addition, even if you can partition by hand, you do not know
for sure how well the performance of the partitioned simulation will
be,
since this performance
depends on a complicated combination of lookahead, load balancing
and communication overhead.
Therefore we developed this tool to help you automate the partitioning
process.
This tool takes a ns2 script and creates a number of pdns scripts that
are ready
to run in parallel on a number of machines, attempting to make the
best trade-off between lookahead, load balancing and communication overhead
in the partitioning process, resulting in the best performance when being
run by PDNS.
Prerequisites
-
This simulation partitioning tool requires the graph partitioning
package METIS
to be installed beforehand. After installing METIS, make sure the metis/pmetis/kmetis
programs are in your path.
-
The PDNS you are using must be version 2.27v1b or above for
you to enjoy this tool. This is because this tool generates
PDNS scripts that make use of an expanded "add-route" syntax that
only works for PDNS2.27v1b, which is not officially
released yet but can be downloaded here.
The installation of pdns2.27v1b is similar to the "Building ns-2 and
PDNS" section of
pdns2.27v1a.
Download and Installation
-
Download the C++ source code autopart.cc.
-
You can use g++ 2.95.2 or above to compile this code, such
as:
g++ -o autopart autopart.cc
|
Assumptions
-
The input ns2 script takes a nam-editor
-like format. More specifically, in this format, the node, agent
and traffic objects must be defined in the form node(n),
agent(n),
traffic_source(n),
where n is a non-negative integer number. A simple example
script of a topology with 6 nodes. illustrated in the following figure,
can be found here. If your original script is
not in this format, it is generally not difficult to write a perl or tcl
script to convert it into this format. Some larger scales
of ns2 simulations in this format can also be downloaded here: 538
nodes, 3,886 nodes, 21,424 nodes, 123,536
nodes, and 417,200 nodes.
-
The partitioning tool assumes two types of nodes in a simulation
topology: endhost and router. Endhosts are the nodes that connect to only
one node (i.e., its router) and carry the agents that send or receive application
traffic. Routers are the nodes that connect to two or more nodes, and only
foward the traffic from other nodes. In the above figure, nodes 1 and 2
are routers, while nodes 3, 4, 5 and 6 are endhosts. Making this distinction
is very important. You must attach agents and applications to
endhosts. If you attached an agent and application to a router such
as node 1 or 2 in the above figure, our tool could produce unpredictable
(and maybe un-runnable) pdns scripts.
-
This tool requires a pre-determined set of relations between the three
parallel simulation performance factors(look ahead, load balancing and
communication overhead) and the simulation model as well as the
computer cluster hardware configuration being used,
in order to assign reasonable edge and vertex
weights to the links and nodes of the topology, so that it can produce
a partitioned simulation that runs the fastest.
This set of relations should be obtained through a series of benchmark
experiments (see References for details).
For now this tool hardcodes the set of relations that we obtained
through benchmark experiments on our Ferrari cluster computer which
consists of eight machines connected via a Gigabit LAN, each machine
having two 3GHz CPU's, sharing 2G memory, and should be applicable to
similar homogeneous platforms directly.
Usage
autopart -n num[.num] [-W routefile] [-R routefile] ns2script
|
-
The ns2script is the name of the original ns2 script
that you want to partition.
-
The -n option is to specify how many parts you want
to partition the original script into. You can specificy it to be either
one level or two level parts. If you have a computer cluster with x
machines,
each machine having y CPU's, and suppose z=x*y, then you
can either specify -n z or specify -n x.y, and in most cases
the latter would generate PDNS scripts with a better performance, since
it takes into account the discrepancy between the communication overhead
over shared-memory and over LAN.
-
The -W and -R option is to ask autopart to
write/read the calculated routes into/from a route file. Route calculation
might take up quite some time in the partitioning process, especially for
large and complex network topologies with a large number of traffic streams.
With this -W option, the first time you partition a simulation you
can store the calculated routes in a file, and next time you do the partition
the same simulation(say, partitioning into a different number of parts),
you can use the -R option to ask autopart to read the calculated
routes and skip the route calculation completely.
Usage Examples
-
To partition 6nodes.tcl into 2 parts to be run on
two different machines:
-
To partition 6nodes.tcl into 2 parts to be run by
two different CPU's on one machine:
autopart -n1.2 6nodes.tcl
|
-
To partition medium.tcl into 8 parts to be run on
4 different machines, each machine having 2 CPU's:
autopart -n4.2 medium.tcl
|
-
To partition m32.tcl into 12 parts to be run on 6
different machines, each machine having 2 CPU's, and wite the routes into
rfile.txt:
autopart -n6.2 -Wrfile.txt m32.tcl
|
-
To partition m32.tcl into 16 parts to be run on 8
different machines, each machine having 2 CPU's, but read the routes from
rfile.txt instead of recalculating the routes:
autopart -n8.2 -Rrfile.txt m32.tcl
|
Running the Partitioned Simulation
You can also download runsim.pl,
a simple Perl script that helps you run the partitioned simulation with
pdns on a computer cluster. The original runsim.pl runs well on
our ferrari cluster which consists of 8 machines sharing the same file
system, each machine having 2 CPU's. The machines are named ferrari001-ferrari008,
as you can see in the runsim.pl. You can modify the ferrari
part of the script to change the machine names to what your cluster machines
are named. Suppose pdns is in your path, after you partition the
m32.tcl into 16 parts as in the last example above, you can run
the partitioned simulation as follows:
Performance of Autopart Itself
We ran autopart on a 3GHz P4 machine with 2G memory
to partition the 417,200-node 400,000-stream ns2 simulation into 8*2 parts.
It took 12 minutes to finish if autopart had to calculate all routes,
and just 5 minutes if autopart was allowed to read previously-calculated
routes. The memory it occupied did not top 700 MB.
The following paper is not about how we developed this tool, but about
a systematic methodology of partitioning network simulations, mainly obtaining
the performance/factor relations by empirical benchmark experiments. The
tool as it is now uses the relations that we obtained on our cluster.
Acknowledgement
This work is supported in part by NSF under contracts
number ANI-9977544 and ANI-0136936.
Contact
Any questions or comments, feel free to contact Donghua Xu
at xu@cc.gatech.edu.
You are number
to access this page since Sep
8, 2004 .