The Rise and Fall of Single’s Unified Scheduling Service

Because of all the issues we ran into while utilizing a unified scheduling service, we decided to abandon it and give the responsibility of scheduling events to the services responsible for processing them.


5 min read
The Rise and Fall of Single’s Unified Scheduling Service

A Scheduling Service is Born

At Single Music, we had a basic need I’m sure many development shops out there have: We needed some process to happen at some particular time in the future. In other words, we needed a way to schedule events for future delivery so that our services could consume them at their scheduled time and respond accordingly. There were a few use cases where we needed that kind of functionality, perhaps the most important of which was scheduling fulfillment emails for purchases of pre-sale releases — the case where a customer pre-orders an artist’s release and an email is delivered to them on the release date with a link to their audio files.

Because the need for scheduled delivery of events cropped up in a couple other places, we had the idea to build a generic event scheduling service that exposed a pretty simple API to the other services in our system. We built a small, distributed spring-boot web service wrapper around a quartz scheduler. The API supported scheduling an event in our pub-sub messaging system. The request object looked something like:

{
	"time": "2018-01-01T00:00:00.000Z",
	"event": {
		"foo": "bar"
	}
}

where the event property was the arbitrary object to be sent at the scheduled time given by the time property. Additional information about the topic and context of the event were sent in custom HTTP headers.

What Worked

The main benefit of the scheduling service was that we now had a central place for scheduling events that other services could use just by sending requests via HTTP. This kept any kind of scheduling logic out of the other services, which reduced their size and complexity.

In addition, Quartz simply works really well. It’s been around for a while and seemed to be a stable library. Events were delivered on time and never more than once, even with multiple instances of the scheduling service running. It all worked really well for a few months, but…

What Didn’t

The main downside of our generic scheduling service came when we started having the need to reschedule events. In fact, at Single, the need to re-schedule events began to crop up a lot. Artists decide they want to release their album on a different day, tracks for staggered releases get created with the wrong date, a customer gets a refund of their order before the release date, etc etc.

We attempted to address the need to reschedule or cancel delivery of events by expanding the API of the scheduling service. Quartz has the concept of Triggers and Jobs. A job is some unit of work and a trigger is the scheduled time or times that the job should execute. Furthermore, Quartz has the notion of groups for triggers and jobs. So, the API was expanded to allow re-scheduling of job groups or trigger groups.

Even with additional endpoints to reschedule or cancel groups, the scheduling service was still not flexible enough to meet our needs. Specifically, we needed to be able to reschedule or cancel delivery of emails for purchases based on a number of different criteria — the customer, the album, the account, and so on.

Jobs and Triggers in Quartz are uniquely identified by only two properties — the name and the group [docs]. So, you can only efficiently query for events that are scheduled by those two properties. In our implementation, the event that was to be delivered was stored in the quartz JobData entity, which was stored as a Blob type in the database, making it difficult or impractical to query for scheduled events by anything inside the event body. Of course, we could have always packed extra information into those two properties by using some kind of string-concatenation scheme or serializing objects to a textual format like JSON—but that seemed like an invitation for subtle bugs down the road.

Another major limitation of the generic scheduling service that we didn’t realize until well into its existence was that storing the scheduled events as immutable blobs in the Quartz database meant that we lost the ability to evolve the structure of those events. Only backwards-compatible changes could be made to the event models. If there was some new, required piece of data that had to be added to events, we were stuck without a way to add that data to existing scheduled events (short of removing the scheduled events and replacing them with new ones that contains the additional data).

What we did

Because of all the issues we ran into while utilizing a unified scheduling service, we decided to abandon it and give the responsibility of scheduling events to the services responsible for processing them. We accomplished this by creating new entities in the database that represent a scheduled event delivery which are directly related to the database tables containing the data for the event information via foreign keys. For example, we created a scheduled_purchase table which essentially contains a date and a foreign key to the corresponding row in the purchase table.

What’s powerful about this approach is we can now query for scheduled items by any property of the associated event because we can directly join to the table housing the event data. This gives us a ton of flexibility and makes possible rescheduling and cancelling groups of events that are grouped by any of the events’ properties.

To process the actual delivery of these scheduled events, we created a cron-like process that queries the scheduled event tables and processes any whose scheduled time is in the past. We accomplished this making use of the convenient scheduling features of spring, which provides an @Scheduled annotation with which you can mark a method to have the framework execute it on a fixed interval or cron-based schedule.

The actual scheduled process is pretty simple and includes the following steps:

  1. Acquire a lock from Redis that is unique to the operation but common across instances of the service.
  2. Query the scheduled event table for events whose scheduled date is in the past.
  3. Publish the events to the messaging system.
  4. Mark the scheduled events as delivered in the database.
  5. Release the lock.

The reason for the locking around the process is to prevent multiple instances of the service from performing the work and delivering duplicates of the scheduled events. At the time of the cron interval, one instance of the service will acquire the lock and execute the delivery process, while other instances will attempt to acquire the lock and time out, exiting without performing the work.

The events are published to the messaging system (rather than being processed directly by the instance querying the scheduled event table) so that the work of processing the events can be distributed across the multiple instances of the consuming service. This allows us to process more events more quickly, particularly in cases where we have a ton of them all scheduled for the same time (like an album release date for example).

Conclusion

To overcome the limitations of running a generic, unified event scheduling service, we chose to distribute that responsibility to the producing/consuming services themselves. We’ve essentially taken a single quartz-based scheduling system and broken it into many specific scheduling processes inside each service that needs scheduled events. This provides us greater flexibility in modifying the delivery time of scheduled events, and the event structures themselves. Having separate processes for each type of scheduled event allows us to tune timeouts, polling intervals, locking strategies, query patterns, etc. on a per-event-type basis. It’s working pretty well so far!

Related Articles

Back To Top

🎉 You've successfully subscribed to Single Technology!
OK