Tools
TopoBuilder
TopoBuilder
provides a common line for you to build a massive topology graph more easily.
Then you can save it to disk and load it in your code.
The following example shows how to build a hierarchical topology graph:
(openfed) python -m openfed.tools.topo_builder
A script to build topology.
<OpenFed>: add_node
Nick Name
red
Does this node requires address? (Y/n)
n
<OpenFed> Node
nick name: red
<OpenFed> Address
+---------+-------------+------------+------+
| backend | init_method | world_size | rank |
+---------+-------------+------------+------+
| null | null | 2 | -1 |
+---------+-------------+------------+------+
<OpenFed>: add_node
Nick Name
green
Does this node requires address? (Y/n)
n
<OpenFed> Node
nick name: green
<OpenFed> Address
+---------+-------------+------------+------+
| backend | init_method | world_size | rank |
+---------+-------------+------------+------+
| null | null | 2 | -1 |
+---------+-------------+------------+------+
<OpenFed>: add_node
Nick Name
purple
Does this node requires address? (Y/n)
n
<OpenFed> Node
nick name: purple
<OpenFed> Address
+---------+-------------+------------+------+
| backend | init_method | world_size | rank |
+---------+-------------+------------+------+
| null | null | 2 | -1 |
+---------+-------------+------------+------+
<OpenFed>: add_node
Nick Name
blue
Does this node requires address? (Y/n)
y
Backend (gloo, mpi, nccl)
gloo
Init method i.e., tcp://localhost:1994, file:///tmp/openfed.sharedfile)
tcp://localhost:1994
<OpenFed> Node
nick name: blue
<OpenFed> Address
+---------+---------------------+------------+------+
| backend | init_method | world_size | rank |
+---------+---------------------+------------+------+
| gloo | tcp://lo...ost:1994 | 2 | -1 |
+---------+---------------------+------------+------+
<OpenFed>: add_node
Nick Name
yellow
Does this node requires address? (Y/n)
y
Backend (gloo, mpi, nccl)
mpi
Init method i.e., tcp://localhost:1994, file:///tmp/openfed.sharedfile)
file://tmp/openfed.sharedfile
<OpenFed> Node
nick name: yellow
<OpenFed> Address
+---------+---------------------+------------+------+
| backend | init_method | world_size | rank |
+---------+---------------------+------------+------+
| mpi | file://t...aredfile | 2 | -1 |
+---------+---------------------+------------+------+
<OpenFed>: build_edge
Start node nick name
red
End node nick name
blue
<OpenFed> Edge
|red -> blue.
<OpenFed>: build_edge
Start node nick name
green
End node nick name
blue
<OpenFed> Edge
|green -> blue.
<OpenFed>: build_edge
Start node nick name
blue
End node nick name
yellow
<OpenFed> Edge
|blue -> yellow.
<OpenFed>: build_edge
Start node nick name
purple
End node nick name
yellow
<OpenFed> Edge
|purple -> yellow.
<OpenFed>: save
Filename:
topology
+--------+-----+-------+--------+------+--------+
| CO\AG | red | green | purple | blue | yellow |
+--------+-----+-------+--------+------+--------+
| red | . | . | . | ^ | . |
| green | . | . | . | ^ | . |
| purple | . | . | . | . | ^ |
| blue | . | . | . | . | ^ |
| yellow | . | . | . | . | . |
+--------+-----+-------+--------+------+--------+
<OpenFed>: analysis
Folder to save the analysis result:
props
Processing red
[{'role': 'openfed_collaborator', 'nick_name': 'red', 'address': {'backend': 'gloo', 'init_method': 'tcp://localhost:1994', 'world_size': 3, 'rank': 2}}]
Processing green
[{'role': 'openfed_collaborator', 'nick_name': 'green', 'address': {'backend': 'gloo', 'init_method': 'tcp://localhost:1994', 'world_size': 3, 'rank': 1}}]
Processing purple
[{'role': 'openfed_collaborator', 'nick_name': 'purple', 'address': {'backend': 'mpi', 'init_method': 'file://tmp/openfed.sharedfile', 'world_size': 3, 'rank': 2}}]
Processing blue
[{'role': 'openfed_aggregator', 'nick_name': 'blue', 'address': {'backend': 'gloo', 'init_method': 'tcp://localhost:1994', 'world_size': 3, 'rank': 0}}, {'role': 'openfed_collaborator', 'nick_name': 'blue', 'address': {'backend': 'mpi', 'init_method': 'file://tmp/openfed.sharedfile', 'world_size': 3, 'rank': 1}}]
Processing yellow
[{'role': 'openfed_aggregator', 'nick_name': 'yellow', 'address': {'backend': 'mpi', 'init_method': 'file://tmp/openfed.sharedfile', 'world_size': 3, 'rank': 0}}]
<OpenFed>: exit
Simulator
Simulator
, which is similar with torch.distributed.launch
, is a module that spawns up multiple federated
training processes on each of the training nodes.
It will build a centralized topology automatically. It is very useful while simulating massive nodes to do the federated learning experience.
Write a piece of code, named run.py
:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--props')
args = parser.parse_args()
print(args.props)
Usage:
(openfed) python -m openfed.tools.simulator --nproc 10 run.py
/tmp/aggregator.json
/tmp/collaborator-1.json
/tmp/collaborator-2.json
/tmp/collaborator-3.json
/tmp/collaborator-4.json
/tmp/collaborator-5.json
/tmp/collaborator-6.json
/tmp/collaborator-7.json
/tmp/collaborator-8.json
/tmp/collaborator-9.json