Running one AI model on one device is a tutorial. Running the same model on 200 devices, with mismatched hardware and patchy connectivity, is a different problem. The single-device version is solved. The fleet version is what most teams underestimate.

This walkthrough is about that second problem. How we built Securade HUB to make edge AI deployments tractable at scale, what the actual mechanics look like, and where the rough edges still are. Live applications (drones, traffic systems, factory floors) can't wait for a round trip to a cloud GPU. Edge solves that. Scaling edge is what we'll focus on here.

Why scaling edge AI is harder than it looks

Edge hardware isn't a single category. A site might have a Jetson Orin Nano in one camera enclosure, a Raspberry Pi 5 in another, and a beefy x86 box in the control room. Each one has different memory, compute, and accelerator characteristics. A model that runs nicely on the x86 box might not even fit on the Pi.

Then there's the network. Some sites have fibre. Others have a 4G dongle that drops out twice a day. Some are intentionally offline for security reasons. Your deployment system has to keep working through all of that, and it has to be able to push a model update to 200 devices without anyone SSH-ing into each one.

Why bother with edge at all

For distributed real-time work, edge is usually the only thing that hits the latency budget. A round trip to a cloud GPU is typically 100-300ms on a good link, more on a bad one. Running the same model on a box next to the camera takes 10-30ms. That delta is the difference between catching the unsafe act and missing it.

The bandwidth story is the other half. Streaming every camera feed to a cloud cluster gets expensive fast. With edge inference, the only thing leaving the site is structured events: detections, scores, occasional clips. The egress drops by a couple of orders of magnitude.

And edge keeps working when the network doesn't. For sites where you need monitoring to keep going through an outage (substations, remote oil platforms, anywhere connectivity is unreliable), the resilience is the point.

What we built into HUB for scale

HUB is open source and modular. The basic idea: a fleet of edge boxes runs the inference, and a single control plane handles model distribution, updates, and health monitoring. Operations stay centralised; inference stays distributed.

On the model side, we lean heavily on lightweight architectures designed for tight memory budgets. Quantised YOLO variants are the workhorse for object detection. For new detectors, you don't have to train from scratch; HUB lets you generate a model from a text prompt or a handful of example images using generative AI. That turns a multi-week labelling exercise into an afternoon.

The device side supports the things you actually find in the field: Raspberry Pi 4/5, NVIDIA Jetson family, Intel NUCs, standard x86 servers. The control plane talks to all of them through the same API.

Walking through a deployment

The example below uses a motion detector, but the workflow is the same for any model type. Three steps: train, optimise, deploy.

Step 1: train from a prompt

You describe what you want the model to detect. HUB does the rest. For motion, the prompt is literally "detect motion".

Here's the API call:


 import securade

 # Initialize the Securade client
 client = securade.Client()

 # Train a motion detection model
 model = client.train_model(prompt='detect motion')

 # Print the model ID
 print(f'Model ID: {model.id}')
 

Step 2: optimise for the target hardware

The freshly trained model is bigger than it needs to be. The optimisation step quantises it and prunes layers that the target hardware won't benefit from. For a Pi, that often means dropping to INT8 and trimming the backbone.

One line gets you a Pi-optimised build:


 # Optimize the model for Raspberry Pi
 optimized_model = client.optimize_model(model.id, device='raspberry_pi')

 # Print the optimized model ID
 print(f'Optimized Model ID: {optimized_model.id}')
 

Step 3: push it to the fleet

HUB ships with a couple of distribution backends. MQTT works well when your devices are behind NAT or on flaky connections, because the broker handles reconnects for you. You publish the new model on a topic, the devices subscribe to it, and they pull it down when they next come online.

Each device validates the model signature, swaps it in, and reports back. If it can't load the model, it rolls back automatically so you don't end up with offline cameras.

The API call:


 # Deploy the model to multiple devices via MQTT
 client.deploy_model(optimized_model.id, deployment_method='mqtt', topic='motion_detection')

 print('Model deployed successfully!')
 

Dealing with mismatched hardware

Heterogeneity is the long pole in the tent. The same logical model has to behave differently on a Jetson and a Pi. There are two tricks we lean on most.

Dynamic batching adapts the batch size at runtime based on what the device can actually handle. Boxes with more headroom process larger batches; constrained boxes drop to batch-of-one. The control plane never has to know about this; the device figures it out locally. Model pruning is the other lever. We carry a "small" and a "large" variant of each model, and pick the right one per device at deploy time. The small variant takes a small accuracy hit, but it runs fast on the cheap hardware where running it at all is the win.

A real example: 10-camera smart campus

One of our deployments is a small university campus with 10 cameras across the main buildings. Each camera has a Pi 4 next to it running motion detection. The dashboard sits in the campus security office. Alerts route to security in real time.

The setup took an afternoon: train motion detection from a prompt, optimise for Pi, push to the cameras via MQTT, point the dashboard at the event stream.

smart_campus_dashboard

Six months later: 95% uptime, sub-50ms detection latency, and the security team actually trusts the alerts because the false positive rate stayed manageable. Most of the operational time goes to occasional model retraining as the seasons change the lighting on the outdoor cameras.

Where you can extend HUB

The deployment backend is a plugin layer, not a fixed thing. We ship MQTT and HTTPS pull out of the box. If you need CoAP or some weird industrial protocol that doesn't have a Python library yet, that's a deployment module you write.

A deployment module is a small Python class with a couple of methods: publish_model, list_targets, get_device_status. Implement those, add tests, open a pull request. We try to keep the surface area small so contributing is low-friction.

Most of the issues we get are from people new to edge AI, not from veterans, and we like it that way. There's no gatekeeping on the contribution side.

The hard part of edge AI isn't training the model. It's the operations around the model: getting it to the right devices, knowing which devices are still running it, pushing updates without breaking things, dealing with mismatched hardware. HUB exists because we kept rebuilding the same operational layer for every project and got tired of it.

If you're thinking about a deployment, the first decision is whether you actually need edge at all. If you can tolerate a few hundred milliseconds of cloud round-trip, don't bother. If you can't, HUB is one of a few options that might save you from writing this whole layer from scratch.

The code lives at github.com/securade/hub. Stars help other people find it, issues help us prioritise.