The E.T. in ETL

The E.T. in ETL

Thanks to JD Hancock for the base image! CC BY 2.0

Anyone who’s ever done ETL knows it can get seriously funky. When I first started working on ETL, I was parsing data for a real estate company. Every once in awhile roofing data would appear in the pool field. “Shingles” isn’t a compelling feature for swimming pools. Go figure.

Thankfully, Node.js gives us a lingua franca for sharing cool solutions. A search for data validation shows there are more than a few options. For ETL, let’s take a look at just one of those options.

Continue reading “The E.T. in ETL”

E is for Event: A Fresh Take on ETL

ETL

As a follow up to my previous post, The Workloads of the Internet of Things, I wanted to walk through a real world example that fully captures the principles of event-driven computing put forth.

Let’s set the stage first… imagine we operate a windmill farm and want to continually track optimal weather conditions to maximize energy output. What basic steps need to be taken?

  1. Sensors capture surrounding weather conditions
  2. Captured data is delivered to a backend service
  3. The service calculates the expected power generation
  4. Calculated data is delivered to an analytics system
  5. Data is presented in a variety of charts and maps

This process flow sounds similar to the common Extract, Transform, Load (ETL) pattern, however the distinction to make here is that data is pushed from the source to the backend service instead of pulled. This means we need to update our pipeline to be more reactive.

Continue reading “E is for Event: A Fresh Take on ETL”

How HotelTonight Streamlined their ETL Process Using IronWorker

HotelTonight has reinvented the task of finding and booking discounted hotel rooms at travel destinations. Designed for last-minute travel planners and optimized for the mobile era, HotelTonight connects adventure-seeking, impulse travelers with just-in-time available hotel rooms wherever they land.

This model has the market-enhancing effect of reducing excess inventory of unused hotel rooms, while delivering a seamless user experience and deep discount for budget travelers who enjoy impulse travel and adventure. What most travelers may not realize, is that behind the scenes at HotelTonight lies a massive business intelligence system that uses a sophisticated cloud-based ETL platform that collects, converts, and stores data from multiple external services.

Continue reading “How HotelTonight Streamlined their ETL Process Using IronWorker”

How HotelTonight uses Iron.io and AWS Redshift to create Ruby-based ETL pipeline (repost)


Creating an ETL pipeline with Iron.io and Redshift

Operating at scale in the cloud almost always equates to having a highly distributed system architecture in place to handle workloads by auto-scaling components out horizontally

Harlow Ward is a developer at HotelTonight and he put together a great post on how they handle issues of scale. In it he talks about their use of Iron.io and Amazon’s Redshift offering to create a simple highly scalable ETL pipeline. Continue reading “How HotelTonight uses Iron.io and AWS Redshift to create Ruby-based ETL pipeline (repost)”