CrowdMesh: A Dynamic Model Parallel Training System on Mobile Devices

Research output: Contribution to journalArticlepeer-review

Abstract

With the rapid development of artificial intelligence (AI), integrating deep neural networks (DNNs) into mobile and embedded devices has become an important trend. This integration significantly enhances the ability of these devices to collect and analyze perceptual data. Traditionally, the integration paradigm relies on cloud-based training and deployment on mobile devices. However, the dynamic characteristics and privacy issues related to real-world perceptual data require training on the device. Despite its advantages, the limited computing resources of mobile devices constitute a key bottleneck that hinders the efficiency of model training. To address this issue, parallel distributed training across mobile device clusters has become a feasible paradigm. However, the inherent mobility of these devices not only increases the possibility of training interruptions but also exacerbates the inefficiency caused by data imbalance. These challenges make traditional cloud-based model parallelization methods unsuitable for mobile environments. To overcome these limitations, this article proposes a novel model parallel system CrowdMesh designed for mobile device clusters. CrowdMesh consists of three key modules: 1) alliance-game-based dynamic cluster start-up and construction; 2) computation-cost-based DL model parallel; and 3) device-mobility-aware parameter propagation module. These modules work together to address training interruptions and restarts caused by device mobility, as well as efficiency issues caused by data imbalance. The experimental results show that CrowdMesh performs better than existing cloud and device-based model parallelization baselines in various training tasks and deep learning models and can reduce training latency by more than 18%.

Original languageEnglish
Pages (from-to)55166-55181
Number of pages16
JournalIEEE Internet of Things Journal
Volume12
Issue number24
DOIs
Publication statusPublished - 1 Jan 2025
Externally publishedYes

Keywords

  • Artificial Internet of Things
  • distributed training
  • mobile computing

Fingerprint

Dive into the research topics of 'CrowdMesh: A Dynamic Model Parallel Training System on Mobile Devices'. Together they form a unique fingerprint.

Cite this