来源: [原创] 加快TensorFlow在树莓派上的执行速度——模型预热 – 编码无悔 / Intent & Focused
转载请注明出处:http://www.codelast.com/
本文软硬件环境:
树莓派:3代 Model B V1.2,内存1GB
OS:Arch Linux ARM
在上一篇文章中,我写了在树莓派上用TensorFlow做的一个深度学习(图像识别)实验,但正如文中所说,50秒执行一次预测的实用性为0。因此,有必要采取一些措施来加快TensorFlow的执行速度,其中一个可行的方法就是“预热”(warm-up),把TensorFlow移植到树莓派上的作者Sam Abrahams已经比较详细地在GitHub上列出了性能测试的结果。依照作者的描述,我也测试了一下,看看那悲催的50秒时间能减少到多少秒。
『1』什么是预热(warm-up)
首先,本文还是对TensorFlow的Python图像分类程序 classify_image.py 来描述的。
预热就是指在真正执行一次预测之前,先执行若干次 Session.run() 方法,从而达到加快一次预测的执行速度的目的。
文章来源:http://www.codelast.com/
『2』代码修改
代码改起来其实很简单。为了能衡量程序运行时间,需要使用Python的time模块,因此在一开始需要import:
1
|
import time |
然后对 run_inference_on_image 方法做一些修改,如下:
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
def run_inference_on_image(image): """Runs inference on an image. Args: image: Image file name. Returns: Nothing """ if not tf.gfile.Exists(image): tf.logging.fatal( 'File does not exist %s' , image) image_data = tf.gfile.FastGFile(image, 'rb' ).read() # the image used to warm-up TensorFlow model warm_up_image_data = tf.gfile.FastGFile( '/root/tensorflow-related/test-images/ubike.jpg' , 'rb' ).read() # Creates graph from saved GraphDef. create_graph() with tf.Session() as sess: # Some useful tensors: # 'softmax:0': A tensor containing the normalized prediction across # 1000 labels. # 'pool_3:0': A tensor containing the next-to-last layer containing 2048 # float description of the image. # 'DecodeJpeg/contents:0': A tensor containing a string providing JPEG # encoding of the image. # Runs the softmax tensor by feeding the image_data as input to the graph. softmax_tensor = sess.graph.get_tensor_by_name( 'softmax:0' ) print ( "Warm-up start" ) for i in range ( 10 ): print ( "Warm-up for time {}" . format (i)) predictions = sess.run(softmax_tensor, { 'DecodeJpeg/contents:0' : warm_up_image_data}) print ( "Warm-up finished" ) # record the start time of the actual prediction start_time = time.time() predictions = sess.run(softmax_tensor, { 'DecodeJpeg/contents:0' : image_data}) predictions = np.squeeze(predictions) # Creates node ID --> English string lookup. node_lookup = NodeLookup() top_k = predictions.argsort()[ - FLAGS.num_top_predictions:][:: - 1 ] for node_id in top_k: human_string = node_lookup.id_to_string(node_id) score = predictions[node_id] print ( '%s (score = %.5f)' % (human_string, score)) print ( "Prediction used time:{} S" . format (time.time() - start_time)) |
其中,我们自己添加的代码有如下几部分:
1
2
|
# the image used to warm-up TensorFlow model warm_up_image_data = tf.gfile.FastGFile( '/root/tensorflow-related/test-images/ubike.jpg' , 'rb' ).read() |
这里使用了另外一张图片来预热模型(和真正预测时使用的不是同一张图片),为了简单写死了路径。
文章来源:http://www.codelast.com/
1
2
3
4
5
6
|
print ( "Warm-up start" ) for i in range ( 10 ): print ( "Warm-up for time {}" . format (i)) predictions = sess.run(softmax_tensor, { 'DecodeJpeg/contents:0' : warm_up_image_data}) print ( "Warm-up finished" ) |
这里循环10次来预热模型。
文章来源:http://www.codelast.com/
1
2
3
4
|
# record the start time of the actual prediction start_time = time.time() # (中间省略) print ( "Prediction used time:{} S" . format (time.time() - start_time)) |
这里打印出了真正预测一张图片的执行时间(秒数),这个时间就是我们真正需要关心的,看它能减少到多少秒。
文章来源:http://www.codelast.com/
『3』测试结果
执行和上一篇文章一样的命令,输出如下:
/usr/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py:1750: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the futureresult_shape.insert(dim, 1)Warm-up startWarm-up for time 0W tensorflow/core/framework/op_def_util.cc:332] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().Warm-up for time 1Warm-up for time 2Warm-up for time 3Warm-up for time 4Warm-up for time 5Warm-up for time 6Warm-up for time 7Warm-up for time 8Warm-up for time 9Warm-up finishedmountain bike, all-terrain bike, off-roader (score = 0.56671)tricycle, trike, velocipede (score = 0.12035)bicycle-built-for-two, tandem bicycle, tandem (score = 0.08768)lawn mower, mower (score = 0.00651)alp (score = 0.00387)Prediction used time:4.141446590423584 Seconds
可见:在10次预热之后,一次预测消耗的时间是 4.14 秒,虽然4秒多还是没有达到我们心目中的理想速度,但这已经比之前的50秒强太多了。
此外,从测试结果我们可以体会到的是:预热(Session.run())的头几次特别慢,后面就快起来了,所以,预热次数太少是不行的。