Port TensorFlow Script to PoseidonΒΆ
Posiedon worker scripts are almost identical to native TensorFlow, but with an extended session initialization object. In order to demonstrate the changes, we have a native TensorFlow script with a corresponding Poseidon script.
Remember that for this release, Poseidon scripts must be compatible with TensorFlow 0.10 API.
Below is an example TensorFlow script, we’ll call it mymodel_train.py
:
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('max_steps', 5, 'Loop count')
def model():
x1 = tf.Variable(1.5, name='x1')
x2 = tf.Variable(3.5, name='x2')
out = x1 + 2 * x2
grads = tf.gradients(ys=[out], xs=[x1, x2])
return grads
def train():
model_op = model()
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)
for step in xrange(FLAGS.max_steps):
model_out = sess.run(model_op)
print 'step ' + str(step) + ' ' + str(model_out)
def main(argv=None):
train()
if __name__ == '__main__':
tf.app.run()
The above could be executed with the following command:
python mymodel_train.py --max_steps 10
A modified mymodel_train.py
for Poseidon will have three changes.
- Add three commandline arguments - distributed, master_address and client_id:
tf.app.flags.DEFINE_boolean('distributed', False, "Use Poseidon")
tf.app.flags.DEFINE_string('master_address', "tcp://0.0.0.0:5555", "master address")
tf.app.flags.DEFINE_integer('client_id', -1, "client id")
- Build a tf.ConfigProto() using these commandline args:
config = tf.ConfigProto()
config.master_address = FLAGS.master_address
config.client_id = FLAGS.client_id
config.distributed = FLAGS.distributed
- Call tf.Session with the new config object:
sess = tf.Session(config = config)
Here, if you use other interfaces like tf.contrib.slim
interface, you can pass this config object as the argument of slim.learning.train
function.
Putting the three changes together, we get the script below.
import tensorflow as tf
FLAGS = tf.app.flags.FLAGS
# Three new command line options
tf.app.flags.DEFINE_boolean('distributed', False, "Use Poseidon")
tf.app.flags.DEFINE_string('master_address', "tcp://0.0.0.0:5555", "master address")
tf.app.flags.DEFINE_integer('client_id', -1, "client id")
tf.app.flags.DEFINE_integer('max_steps', 5, 'Loop count')
def model():
x1 = tf.Variable(1.5, name='x1')
x2 = tf.Variable(3.5, name='x2')
out = x1 + 2 * x2
grads = tf.gradients(ys=[out], xs=[x1, x2])
return grads
def train():
model_op = model()
init = tf.initialize_all_variables()
# Add a config variable to pass Poseidon settings
config = tf.ConfigProto()
config.master_address = FLAGS.master_address
config.client_id = FLAGS.client_id
config.distributed = FLAGS.distributed
# Initialize tf.Session with the config
sess = tf.Session(config = config)
sess.run(init)
for step in xrange(FLAGS.max_steps):
model_out = sess.run(model_op)
print 'step ' + str(step) + ' ' + str(model_out)
def main(argv=None):
train()
if __name__ == '__main__':
tf.app.run()
Given a config.json file, Poseidon can be executed using the above script with the following command (remember to change /path/to/mymodel_train.py):
psd_run -c config.json -o ~/logs "python /path/to/mymodel_train.py --max_steps 10"
Note: psd_run
adds the following flags to the worker when it launches:
- distributed
- master_address
- client_id