HotelTonight has reinvented the task of finding and booking discounted hotel rooms at travel destinations. Designed for last-minute travel planners and optimized for the mobile era, HotelTonight connects adventure-seeking, impulse travelers with just-in-time available hotel rooms wherever they land.
This model has the market-enhancing effect of reducing excess inventory of unused hotel rooms, while delivering a seamless user experience and deep discount for budget travelers who enjoy impulse travel and adventure. What most travelers may not realize, is that behind the scenes at HotelTonight lies a massive business intelligence system that uses a sophisticated cloud-based ETL platform that collects, converts, and stores data from multiple external services.
Extract, Transform, Load (ETL) has been around in IT circles for a long time, dating back even to tape storage and the mainframe era, but the difference here is the use of cloud-based services along with a loosely-coupled and flexible approach to move data between systems in near real-time. The benefits include far less overhead and much faster workload processing, while translating into more timely and accessible information with which to make decisions.
Cloud-based ETL – Scalable and Event Driven
This HotelTonight ETL pipeline gathers external data from a host of sources and brings it together into Amazon Redshift, a managed, petabyte-scale data warehouse solution provided by Amazon Web Services. Amazon Redshift acts as the “Unified Datastore” and makes use of the SQL query language to connect a variety of platforms using a Postgres Adapter. Custom Ruby scripts power the HotelTonight ETL process, connecting the Business Intelligence team there to the SQL Workbench which front-ends the Amazon Redshift clusters. The dashboard lets anyone in the organization query the data and extract information for use in their initiatives.
The net result of this complex operation is a fully aggregated dataset that is more accurate, more up-to-date, and more reliable. Turning raw data into reliable, up-to-date information enables HotelTonight analysts to make faster decisions and faster updates on available hotel room information for their users.
The key cog powering this cloud-based ETL process – also allowing it to be scalable and completely event-driven – is IronWorker, an asynchronous task-processing service provided by Iron.io. HotelTonight uses IronWorker as the “go-to platform for scheduling and running our Ruby-based ETL worker pipeline.” says Harlow Ward, former lead developer at HotelTonight.
“The team at Iron.io has been a great partner for us while building the ETL pipeline,” says Ward. “Their worker platform gives us a quick and easy mechanism for deploying and managing all our Ruby workers.”
Harlow further describes how IronWorker ensures HotelTonight’s ETL process is repeatable, scalable and protected in the case of failures. “Keeping [worker] components modular allows us to separate the concerns of each worker and create a repeatable process for each of our ETL integrations,” says Ward.
A Distributed ETL Workflow
HotelTonight uses a custom worker for each external data source. (see figure 1 for details of HotelTonight data sources) This independence means that aggregation of each data source is independent of other sources.
“IronWorker’s modularity allows for persistent points along the lifetime of the pipeline. It also allows [HotelTonight] to isolate failures and more easily recover should data integrity issues arise.” according to Ward. “Each worker in the pipeline is responsible for its own unit of work and has the ability to kick off the next task in the pipeline.”
For a detailed discussion of Harlow’s ETL process at work, check out Harlow’s blog at: http://engineering.hoteltonight.com/ruby-etl-with-ironworker-and-redshift
This distributed pattern also improves agility in that changes can be made quickly within one worker/data source pull without causing a need to redeploy a full application or push changes beyond that particular workflow. New data sources can also be brought on line just by writing simple scripts in whatever language the developers want to use. (Ruby in the case of HotelTonight.)
Workflow Monitoring and Orchestration
In addition to solving the challenge for quick and easy deployment of independent workers, the Iron.io dashboard (HUD) provides current status and reporting information to HotelTonight developers giving them instant visibility and insight on the state of their ETL pipeline. Users can control settings for the workflow, including increasing or decreasing concurrency, retrying tasks that may have failed in prior attempts, and changing job schedulers.
“The administration area boasts excellent dashboards for reporting worker status and gives us great visibility over the current state of our pipeline,” says Ward.
Leveraging Unified Data for Faster Decision Making
Now that HotelTonight’s business intelligence data is consolidated in Amazon Redshift, HotelTonight can run SQL queries to combine and correlate data from multiple platforms into a unified dataset. Prior to this solution, HotelTonight’s “data analytics” consisted of multiple exported CSVs from each data source, merged into a single pivot table and then applying lots of “magic” to make sense of it all.
IronWorker makes it possible for HotelTonight to streamline and automate their entire ETL process and bring together all of their disparate data sources in a flexible datastore. HotelTonight can rest easy with the assurance that, in using IronWorker, their data pipeline into Amazon Redshift is in excellent order.
At Iron.io, we’re big users of HotelTonight and can’t wait to book our next business road show using their service. We wouldn’t think of doing it any other way.
How to Get Started Today
To give Iron.io a try, sign up for a free IronWorker or IronMQ account today at Iron.io.
As a reward for signing up, we’ll even extend to you a 30-day trial of advanced features so that you can see how moving to the cloud will change the way you think about application development.