ArchitectureCloudAWS

BlueYonder to Samsara: Designing a Real-Time TMS Integration

Connecting a legacy TMS to a modern IoT fleet platform in real time — without rewriting either system and without breaking the 99.9% SLA.

Transportation Management Systems are among the most integration-dense pieces of enterprise software in existence. Every carrier, every IoT sensor platform, every ERP has a different data model, a different event format, and a different opinion on what "real time" means. This is the story of one integration — BlueYonder TMS to Samsara fleet telematics — and the architectural decisions that made it work.

The problem

Our client operated a regional logistics network across the Gulf and Levant. Their BlueYonder TMS handled route planning, load optimisation, and carrier management. Their fleet of 1,400 vehicles had recently been equipped with Samsara GPS and sensor units.

The gap: BlueYonder had no way to consume the real-time vehicle location and condition data that Samsara was producing. Route ETAs were calculated at dispatch and never updated. Carriers would report exceptions verbally, hours after they'd occurred. The operations team was flying blind.

The brief: build a real-time integration that pushes Samsara telemetry into BlueYonder's operational picture without requiring either vendor to change their system.

The constraints

BlueYonder's API is batch-oriented. The TMS was designed in an era when "near real-time" meant updates every 15 minutes. The API accepts updates but isn't built for high-frequency writes. We needed to be careful about write volume.

Samsara emits at high frequency. Vehicle location events come every 30 seconds per vehicle at default settings. For 1,400 vehicles, that's ~2,800 events per minute at rest. During high-activity windows, this spikes significantly.

The SLA was 99.9% uptime. The integration couldn't be the weak link. A failure that caused the TMS to show stale data was operationally acceptable for a short window; a failure that corrupted TMS data was not.

No changes to either vendor system. BlueYonder customisation was contractually constrained. Samsara is a SaaS platform — we consume their webhook output, we don't modify it.

The architecture

The integration is a stream processing pipeline built on AWS, sitting between Samsara's outbound webhooks and BlueYonder's inbound REST API.

Ingestion layer

Samsara webhooks deliver to an API Gateway endpoint. We chose API Gateway over a direct load balancer because we needed request validation and throttling independent of our processing layer. The gateway validates the Samsara HMAC signature, rejects malformed payloads, and writes valid events to an SQS queue. End-to-end latency from Samsara webhook delivery to SQS write is under 200ms.

Processing layer

A Lambda function consumes from the SQS queue in batches of 25. Processing logic is deliberately thin: parse the Samsara event schema, enrich with the vehicle-to-route mapping from DynamoDB (maintained by a separate sync job from BlueYonder), and write to a Kinesis Data Stream partitioned by route ID.

We chose Kinesis over direct writes to BlueYonder for two reasons: it gives us a durable ordered stream we can replay if the downstream API is unavailable, and it decouples the Samsara event rate from the BlueYonder write rate.

Aggregation and write layer

A second Lambda function consumes from Kinesis and implements the write logic for BlueYonder. This is where the interesting decisions live.

Event deduplication. Samsara delivers at-least-once. We maintain a DynamoDB table keyed on vehicle ID + event timestamp; the function checks before writing to BlueYonder and drops duplicates.

Write rate limiting. BlueYonder recommends no more than 10 writes per second per integration. We implement a token bucket in ElastiCache (Redis) to enforce this. Events that exceed the budget are written to a priority queue and processed in order.

Batch coalescing. For vehicles that are stationary or moving slowly, emitting a location update every 30 seconds is noise. We coalesce events for a given vehicle over a 2-minute window, emitting only when the location has changed beyond a configurable threshold (default: 100 metres). This reduced BlueYonder write volume by 67% without meaningful degradation in ETA accuracy.

"The coalescing logic was the insight that made the integration viable. Raw throughput from Samsara would have overwhelmed BlueYonder's API tier — we needed to think about what the TMS actually needed, not just what the sensor was producing."

Handling failures

The SLA requirement made failure handling non-negotiable.

Samsara webhook failures. API Gateway returns 200 to Samsara on receipt. If processing fails downstream, that's our problem — we never want Samsara to retry because of our processing failures. The queue is our durability layer.

BlueYonder API unavailability. If BlueYonder returns 5xx, the Lambda writes to a dead-letter SQS queue with the failed batch. A separate process monitors the DLQ and replays when BlueYonder recovers. We alert when the DLQ depth exceeds a threshold.

Data quality issues. Samsara occasionally emits events with GPS coordinates of (0,0) — the null island problem. The processing layer rejects these with a structured log entry and continues. We built a small dashboard in CloudWatch that surfaces data quality anomalies.

What we measured after go-live

Six weeks post-deployment:

  • ETA accuracy improved by 34% as measured against actual arrival times
  • Carrier exception reporting time dropped from 4.2 hours average to 18 minutes — operations staff could see anomalies in real time rather than waiting for driver calls
  • Zero SLA breaches on the integration layer
  • BlueYonder write volume was 67% lower than a naive integration would have produced

Five things to take away

  1. High-frequency sensor data almost never needs to write directly to a batch-oriented system — aggregate and coalesce first.
  2. API Gateway as a webhook receiver gives you validation, throttling, and logging before your processing logic even starts.
  3. Decouple ingestion rate from write rate using a stream — Kinesis or Kafka depending on your existing infrastructure.
  4. The null island problem (GPS 0,0) is real and common. Filter it at the ingestion layer.
  5. Dead-letter queues are not a failure mode — they are the failure mode you planned for. Build the replay mechanism before you need it.