Will my delivery be late?

Will my delivery be late?

LastMileLink Technologies graphBy Gus Hinestrosa and Eduard Lazar



The early morning delivery was 20 minutes late and this cascaded into the rest of your day. You missed the train, rushed that boiling coffee, made it late to work, made it late to a meeting…

As a consumer, we take for granted the complexity of the supply chain, especially at the current industry-wide volumes, and we get more and more demanding as the sector evolves. Just a few years ago we would use delivery services for that odd, bulky, unusual order. That has all changed. We now expect a home delivery option for everything, from fresh produce to DIY materials. A sign of this change is the strong growth of the online food and retail sector in the UK in 2017: +7 % year-on-year!*

What makes a delivery arrive on time? Our order needs to be sent to the retailer efficiently, then our product must be sourced, prepared and packaged, given to a known courier at a previously agreed time and location, then the courier must deliver it while respecting a number of rules and therefore guaranteeing its integrity. Managers and delivery controllers intervene in the process to ensure that every step happens as it should. However, from time to time, delays occur.

Among other products, at LastMileLink Technologies we are prototyping an engine that would predict if a specific delivery will be late. This prototype has the potential to solve a few headaches at different levels. Operators would have further assistance in identifying the ‘trouble children’ of late deliveries; the clients could be warned in advance; alternative arrangements could be put in place; the retailer would gain confidence in the delivery supplier and could potentially preempt any client complaint.

LastMilelink Graph 2





Figure 1: We want to know if a delivery will be late with enough notice, e.g. at the time of the parcel’s collection. At that point in time, we might have gathered enough information (events timeline, courier characteristics, route, type of parcel, etc.) to notify if a customer’s parcel will arrive late.

How is this possible?

We have realised that most late deliveries have hidden traits that are not visible to the naked eye of an experienced operator. These hidden traits derive from human actions, unforeseen circumstances and physical realities, and from limitations of the systems in place. Fortunately some of these traits are identifiable in the data. Indeed, even in a limited data set from a few weeks, patterns become visible. Whereas most of the jobs happen on time, occasionally delays happen and these can be related to a range of variables (eg. the day of the week, the hour of the day).

LastMileLink Technologies graph

Figure 2: There are trends that are very visible. In certain days or hours, the time gaps for both on time or late jobs show different distributions.

Other noticeable trends include the correlation between some delay metrics and some of the intervals between events or the poorer correlation between these same metrics and the collection-to-delivery distances.

LastMileLink Technologies

Figure 3: Some variables seem to be strongly correlated to job delay metrics, whereas some other (i.e. distance) are poorly correlated to the delay parameters and, possibly, poor predictors. However, these trends can be specific to the type of delivery studied herein.

Nonetheless, not all variables respond according to our initial intuition. For example, an increase in the number of deliveries performed in a full day does not necessarily mean an increase in late jobs.

LastMileLink graph1

Figure 4: As the daily load increases, the proportion of jobs arriving late to their destination does not necessarily increase.  

A thorough exploratory study of our data allows us to get a ‘feeling’ for trends and anomalies. We can adapt our data for consumption through the tools of choice, or as it is called, feature engineering. Many techniques are used: normalisation (we all love clean, normal distributions for our variables when possible), elimination of null values and outliers, making categorical variables continuous or vice versa, extraction of new variables from existing ones, reduction of the number of predictors (to avoid redundant variables, for instance), reduction of co-linear variables, and basically making the data set more compact, less noisy and more usable… without losing information!

After all this cleaning, we can let the machine learn! To train our algorithms, we feed thousands of jobs with their labelled outcome (job late / job on time) into state-of-the-art machine learning engines. These engines ‘learn’ these trends and predict the successes and failures of the service to some degree of accuracy. As is common practice, we use most of our data set to train the algorithms, a smaller chunk to test its performance, and later on, a single job data sample to produce a real-time prediction. In our hunt for the best performing tool, we tried several algorithms to solve this binary classification problem, being aware of the possible limitations and flaws of each tool. Some algorithms are very adaptable, but not very interpretable (black boxes), whereas others are very interpretable (we know what happens inside), but not very adaptable to unusual data (e.g. a linear regression fitting non-linear data sets).

Let’s check out some of the resulting metrics:


Figure 5: Our goal is to achieve the highest ROC score, which tells us how well our binary classification is working. However, as we are trying to predict a relatively rare event (a late delivery), a single performance diagnosis is not enough because the ROC metrics can be quite high to start with. By contrast, notice how the 1-Recall metric varies for the same algorithms. 1-Recall shows the proportion of actual late deliveries classified incorrectly, and a low value is desirable. Together with other metrics (Precision, F1 scores, etc.) we can make a thorough diagnosis before making a final choice.

The metrics can vary wildly between algorithms, but each adds a different piece of the puzzle before choosing the one (or ones) that will be incorporated into the production predictive model. For example, the more interpretable models (logistic regression) can tell us which variables are good or weak predictors and, based on this, we can inform other non-linear, non-interpretable models.

The real power behind our machine learning technology is the cloud platform developed by LastMileLink Technologies. Our predictive models are designed in scripting languages (R or Python), taking advantage of the rich ecosystem of machine learning libraries (Tensorflow, Scikit-Learn, Keras, etc.). However, a prototype on a Python script cannot survive by itself. As shown in the diagram, the training, testing and real-time predictions of machine learning models have to be fully embedded within our own cloud ecosystem. The range of micro-services built by the LastMileLink team allows the smooth recording, retrieval, processing and, basically, the ‘dissection’ of each delivery job, including of course the execution of predictive models like the one shown here. These functionalities are the ones behind a seamless delivery execution which adds value to the client, to the retailer and to the courier.

Predictive tools like this are changing last mile delivery, by removing tedious minor tasks from human actors and permitting data-driven automations. Ultimately, this would allow the actors in this process (controllers, drivers, clients, managers, etc.) to focus on the human tasks that really matter.

* Office for National Statistics, Retail sales in Great Britain, Table 4