yolov3--17--yolo-mobilenetv2-调试错误总结

mac2026-04-20  9

Yolov-1-TX2上用YOLOv3训练自己数据集的流程(VOC2007-TX2-GPU)

Yolov--2--一文全面了解深度学习性能优化加速引擎---TensorRT

Yolov--3--TensorRT中yolov3性能优化加速(基于caffe)

yolov-5-目标检测:YOLOv2算法原理详解

yolov--8--Tensorflow实现YOLO v3

yolov--9--YOLO v3的剪枝优化

yolov--10--目标检测模型的参数评估指标详解、概念解析

yolov--11--YOLO v3的原版训练记录、mAP、AP、recall、precision、time等评价指标计算

yolov--12--YOLOv3的原理深度剖析和关键点讲解

yolov--14--轻量级模型MobilenetV2网络结构解析--概念解读

yolov--15--史上最详细的Yolov3边框预测分析--改进

yolov3--16--一文详解卷积操作中的padding填充操作


 

CUDA_VISIBLE_DEVICES=4 python train.py --gpu=4 &

 调试错误1

(pytorch1.1.0-py2.7_cuda9.0) Liqing@user-ubuntu:~/hangyu/stronger-yolo-c/v3$ WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-11-01 23:38:28.294675: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-11-01 23:38:28.324619: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz 2019-11-01 23:38:28.330328: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f09f4ce3070 executing computations on platform Host. Devices: 2019-11-01 23:38:28.330377: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt Reading annotation for 1/181 Reading annotation for 101/181 Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl /home/Liqing/hangyu/stronger-yolo-c/v3/eval/voc_eval.py:194: RuntimeWarning: invalid value encountered in divide rec = tp / float(npos) Reading annotation for 1/181 Reading annotation for 101/181 Saving cached annotations to /home/Liqing/hangyu/stronger-yolo-c/v3/eval/cache/annots.pkl Reading annotation for 1/181


[0. 0. 0. ... 0. 0. 0.] [nan nan nan ... nan nan nan] nan

# compute precision recall fp = np.cumsum(fp) tp = np.cumsum(tp) print tp #add rec = tp / float(npos) print rec #add # avoid divide by zero in case the first detection matches a difficult # ground truth prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) ap = voc_ap(rec, prec, use_07_metric) print ap #add return rec, prec, ap





WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-11-14 21:11:32.104982: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-11-14 21:11:32.117354: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299985000 Hz 2019-11-14 21:11:32.122381: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x7f337ba03df0 executing computations on platform Host. Devices: 2019-11-14 21:11:32.122419: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> WARNING:tensorflow:From /home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. INFO:tensorflow:Restoring parameters from weights/mobilenet_v2_1.0_224.ckpt 2019-11-14 21:53:59.213951: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at save_restore_v2_ops.cc:134 : Resource exhausted: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device Traceback (most recent call last): File "train.py", line 159, in <module> Yolo_train().train() File "train.py", line 149, in train self.__save.save(self.__sess, os.path.join(self.__weights_dir, 'yolo.ckpt-%d' % period)) File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1171, in save {self.saver_def.filename_tensor_name: checkpoint_file}) File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/Liqing/anaconda3/envs/pytorch1.1.0-py2.7_cuda9.0/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: weights/yolo.ckpt-1.data-00000-of-00001.tempstate5297156131404480268; No space left on device [[node load_save/save_1/SaveV2 (defined at train.py:80) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

解决:

自己数据集label中类别大小写与训练类别不一致问题(统一改为小写)

最新回复(0)