There are a number of directions which would be natural extensions of this work. To begin with, the merging process of the databases added the noise of labeling classes at the level of classes by having near-duplicate names of classes in the sources (e.g., brown-planthopper and brown plant hopper). The close re-annotation exercise to fix these overlaps into a clean 2022 class taxonomy would probably enhance mAP 50 in all models and give a cleaner point of reference to compare to in future. Second, substituting the standard CIoU bounding box regression loss with Wise-IoU or Shape-IoU should enhance localisation accuracy of small-bodied pests including rice thrips and whorl maggots. The loss functions are tailored to address imbalanced regression challenge between simple and challenging examples, which is especially pertinent when the dataset has a large variation in the sizes of pest bodies.
Third, the fact that Florence-2 failed completely in zero- shot mode does not rule out its usefulness as a few-shot or fine-tuned model. Specific study of domain-adapted prompting techniques, or a few-shot visual fine-tuning of Florence-2 on representative rice pest images, would clarify whether big vision-language models can be usefulized to this domain with limited labelled data. This direction is becoming increasingly similar in relation to the multimodal foundation models that have been developed at a rapid pace. Fourth, we tested all models with a constant 640×640 input resolution. A resolution ablation experiment at 416x416, 640x640, and 800x800 inputs would help understand the tradeoff between small- object recall and inference speed in this particular pest taxonom, and can help inform configuration decisions in various deployment scenarios (e.g., fixed camera traps versus UAV video streams). Lastly, the practical deployment case presented in this paper and the practical implementation of the proposed model on real edge hardware, like a Jetson Nano or a Raspberry Pi 5 with an AI accelerator, and actual field conditions would confirm that the proposed model is practically deployable and offer the community directly actionable benchmarks against which agricultural edge computing can be assessed.
1H. Yang, D. Lin, G. Zhang, H. Zhang, J. Wang, and S. Zhang, "Research on detection of rice pests and diseases based on improved YOLOv5 algorithm," Applied Sciences, vol. 13, no. 18, p. 10188, Sept. 2023. DOI: 10.3390/app131810188
2X. Ren, M. Li, Z. Zhang, and L. Wang, "Paddy field pest detection using YOLOv7," Biosystems Engineering, vol. 225, pp. 34–48, Jan. 2023. DOI: 10.1016/j.biosystemseng.2022.11.002
3S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: Convolutional block attention module," in Proc. ECCV, Munich, Sept. 2018, pp. 3–19. DOI: 10.1007/978-3-030-01234-2_1
4T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proc. CVPR, Honolulu, HI, June 2017, pp. 936–944. DOI: 10.1109/CVPR.2017.106
5M. Tan, R. Pang, and Q. V. Le, "EfficientDet: Scalable and efficient object detection," in Proc. CVPR, Seattle, WA, June 2020, pp. 10781–10790. DOI:10.1109/CVPR42600.2020.01079
6Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and J. Chen, "DETRs beat YOLOs on real-time object detection," arXiv preprint arXiv:2304.08069, Apr. 2023. DOI: 10.48550/arXiv.2304.08069
7B. Xiao, H. Wu, W. Xu, X. Dai, H. Hu, Y. Lu, M. Zeng, C. Liu, and L. Yuan, "Florence-2: Advancing a unified representation for a variety of vision tasks," arXiv preprint arXiv:2311.06242, Nov. 2023. DOI: 10.48550/arXiv.2311.06242
8G. Jocher, A. Chaurasia, and J. Qiu, "Ultralytics YOLOv8," 2023. [Online]. Available: https://github.com/ultralytics/ultralytics. DOI: 10.5281/zenodo.7347926
9K. Thenmozhi and U. S. Reddy, "Image processing techniques for insect shape detection in field crops," in Proc. Int. Conf. Inventive Computing and Informatics (ICICI), Coimbatore, India, Nov. 2017, pp. 912–916. DOI: 10.1109/ICICI.2017.8365270
10J. Wang, C. Lin, L. Ji, and A. Liang, "A new automatic identification system of insect images at the order level," Knowledge-Based Systems, vol. 33, pp. 102–110, Sept. 2012. DOI: 10.1016/j.knosys.2012.03.014
11N. Larios, B. Soran, L. G. Shapiro, G. Martinez-Munoz, J. Lin, and T. G. Dietterich, "Haar random forest features and SVM spatial matching kernel for stonefly species identification," in Proc. ICPR, Tampa, FL, Dec. 2008, pp. 1–4. DOI: 10.1109/ICPR.2008.4761904
12X. L. Li, S. G. Huang, M. Q. Zhou, and G. H. Geng, "KNN-spectral regression LDA for insect recognition," in Proc. Int. Conf. Information Science and Engineering (ICISE), Nanjing, China, Dec. 2009, pp. 1315–1318. DOI: 10.1109/ICISE.2009.680
13Q. Dong, L. Sun, T. Han, M. Cai, and C. Gao, "PestLite: A novel YOLO-based deep learning technique for crop pest detection," Agriculture, vol. 14, no. 2, p. 228, Jan. 2024. DOI: 10.3390/agriculture14020228
14W. Zhou, Y. Niu, Y. Wang, and D. Li, "Improved YOLOv4-GhostNet method for identification of rice pests and diseases," Jiangsu Journal of Agricultural Sciences, vol. 38, no. 7, pp. 685–695, July 2022. DOI: 10.3969/j.issn.1000-4440.2022.07.001
15J. Liao, K. Liu, Y. Yang, C. Yan, A. Zhang, and D. Zhu, "Research on rice disease identification model in natural environment based on RDN-YOLO," Transactions of the Chinese Society for Agricultural Machinery, vol. 55, no. 11, pp. 233–242, Nov. 2024. DOI: 10.6041/j.issn.1000-1298.2024.11.022
16K. Li, J. Wang, H. Jalil, and H. Wang, "A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5," Computers and Electronics in Agriculture, vol. 204, p. 107534, Jan. 2023. DOI: 10.1016/j.compag.2022.107534
17Y. Di, S. L. Phung, J. Van Den Berg, J. Clissold, and A. Bouzerdoum, "TP-YOLO: A lightweight attention-based architecture for tiny pest detection," in Proc. IEEE Int. Conf. Image Processing (ICIP), Kuala Lumpur, Oct. 2023, pp. 1735– 1739. DOI: 10.1109/ICIP49359.2023.10222114
18D. Sun, K. Zhang, H. Zhong, J. Xie, X. Xue, M. Yan, W. Wu, and J. Li, "Efficient tobacco pest detection in complex environments using an enhanced YOLOv8 model," Agriculture, vol. 14, no. 3, p. 355, Feb. 2024. DOI: 10.3390/agriculture14030355
19Y. Hu, X. Deng, Y. Lan, X. Chen, Y. Long, and C. Liu, "Detection of rice pests based on self-attention mechanism and multi-scale feature fusion," Insects, vol. 14, no. 3, p. 280, Mar. 2023. DOI: 10.3390/insects14030280
20J. Yin, J. Zhu, G. Chen, L. Jiang, H. Zhan, H. Deng, Y. Long, Y. Lan, B. Wu, and H. Xu, "An intelligent field monitoring system based on enhanced YOLO-RMD architecture for real-time rice pest detection and management," Agriculture, vol. 15, no. 3, p. 312, Mar. 2025. DOI: 10.3390/agriculture15030312
21Y. Wang, C. Yi, T. Huang, and J. Liu, "Research on intelligent recognition for plant pests and diseases based on improved YOLOv8 model," Applied Sciences, vol. 14, no. 12, p. 5353, June 2024. DOI: 10.3390/app14125353
22J. Yin, P. Huang, D. Xiao, and B. Zhang, "A lightweight rice pest detection algorithm using improved attention mechanism and YOLOv8," Agriculture, vol. 14, no. 7, p. 1052, July 2024. DOI: 10.3390/agriculture14071052
23Z. Zhang, W. Zhan, K. Sun, Y. Zhang, Y. Guo, Z. He, D. Hua, Y. Sun, X. Zhang, and S. Tong, "RPH-Counter: Field detection and counting of rice planthoppers using fully convolutional network with object-level supervision," Computers and Electronics in Agriculture, vol. 178, p. 105766, Nov. 2020. DOI: 10.1016/j.compag.2020.105766
24J. Deng, C. Yang, K. Huang, L. Lei, J. Ye, W. Zeng, J. Zhang, Y. Lan, and Y. Zhang, "Deep-learning-based rice disease and insect pest detection on a mobile phone," Agronomy, vol. 13, no. 8, p. 2139, Aug. 2023. DOI: 10.3390/agronomy13082139
25 K. Thenmozhi and U. S. Reddy, "Crop pest classification based on deep convolutional neural network and transfer learning," Computers and Electronics in Agriculture, vol. 164, p. 104906, Sept. 2019. DOI: 10.1016/j.compag.2019.104906
26K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. CVPR, Las Vegas, NV, June 2016, pp. 770–778. DOI: 10.1109/CVPR.2016.90