Parsing travel e-mails

Incentive

With funnel.travel, parsing travel confirmation e-mails is one of those things the customer expects as a capability, yet it’s not a a core feature. The perfect candidate to use an external service.

In order to connect funnel.travel with an external e-mail parsing service, the service needs to

  • provide a decent API
  • yield results of a quality we’re comfortable passing on to our customers
  • have a reasonable cost structure

HTML structure

It is important to note that e-mail clients might change the internal structure of e-mail content. Here’s a row of the original Expedia e-mail, describing departure date, number of flight stops and flight duration. Note the IDs indicating the content type.

<tr id="automation-label-flight-leg-0" bgcolor="#D6EEF2">
    <td id="departure-date-0" align="left"  valign="top" width="50%" style="padding:10px 5px 10px 10px;font-family:arial;font-size:12px; ">
        <strong>May 25, 2018</strong> - Departure
        <span id="stop-automation-label-0" class="flight-detail-service-header">
                1 stop
        </span>
    </td>
	<td align="left" valign="top" width="50%" style="text-align:right;padding:10px 10px 10px 5px;font-family:arial;font-size:12px;color: #999;">
		Total travel time: 5 h 5 m
	</td>
</tr>

And here’s the same row forwarded by GMail

<tr id="m_-7735433647783921150automation-label-flight-leg-0" bgcolor="#D6EEF2">
    <td id="m_-7735433647783921150departure-date-0" align="left" valign="top" width="50%" style="padding:10px 5px 10px 10px;font-family:arial;font-size:12px">
        <strong>May 25, 2018</strong> - Departure
        <span id="m_-7735433647783921150stop-automation-label-0" class="m_-7735433647783921150flight-detail-service-header">
                1 stop
        </span>
    </td>
    <td align="left" valign="top" width="50%" style="text-align:right;padding:10px 10px 10px 5px;font-family:arial;font-size:12px;color:#999">
Total travel time: 5 h 5 m
    </td>
</tr>

And how about after forwarding that with Microsoft Outlook

<tr id="m_6879808603257357192m_-7735433647783921150automation-label-flight-leg-0">
	<td width="50%" valign=top style='width:50.0%;border:none;background:#D6EEF2;padding:7.5pt 3.75pt 7.5pt 7.5pt' id="m_6879808603257357192m_-7735433647783921150departure-date-0">
		<p class=MsoNormal>
			<strong>
				<span style='font-size:9.0pt;font-family:"Arial","sans-serif"'>May 25, 2018 </span>
			</strong>
			<span style='font-size:9.0pt;font-family:"Arial","sans-serif"'>- Departure <span class=m6879808603257357192m-7735433647783921150flight-detail-service-header>1 stop </span>
				<o:p/>
			</span>
		</p>
	</td>
	<td width="50%" valign=top style='width:50.0%;border:none;background:#D6EEF2;padding:7.5pt 7.5pt 7.5pt 3.75pt'>
		<p class=MsoNormal align=right style='text-align:right'>
			<span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:#999999'>Total travel time: 5 h 5 m <o:p/>
			</span>
		</p>
	</td>
</tr>

Candidates

A bit surprisingly, finding companies which provide e-mail parsing as a service is not entirely straightforward. The following list represents our findings, but should by no means be taken as comprehensive.

WorldMate

WorldMate was founded in 2000 in California, and eventually acquired by CWT. The mobile app ‘WorldMate’ has been taken off the shelf. The API, however, remains publicly available. Our understanding is that it currently powers myCWT (formerly “CWT To Go”).

https://developers.worldmate.com

Try out by sending your sample e-mail to apidemo@worldmate.com.

Traxo

Traxo was founded in 2008 and is based in Dallas (TX). The company offers various products to manage itineraries and get flight alerts.

https://developer.traxo.com/

To try out, you’ll need to sign up for an API account (free)

AwardWallet

AwardWallet was founded in 2004, with a focus on managing loyalty points.

https://awardwallet.com/api/main#email-parsing-api

Try out using https://service.awardwallet.com/email/test/parse.

Google

No comparison could just leave out Google. While GMail does a great job at displaying flight information straight out of a confirmation e-mail, that magic is based on e-mail markup defined by Schema.org (see the Google guideline here).

Google does not offer a public API to parse confirmation e-mails.

Other

Results

Fair warning, we didn’t run hundreds of tests against all contenders. To get an impression, we simply ran two confirmation e-mails:

  • An Expedia confirmation for a BA flight ZRH – LHR – GLA and back, as well as a rental car (Avis). A package price is present in the e-mail. The return date is 8 months in the past.
  • An e-bookers confirmation for a KLM flight ZRH – AMS – INV and back. The return date is 4 days in the future.
  • A KLM confirmation for a flight ZRH – AMS – GLA and back, two participants, with seat reservations, and paid by credit card. The return date is 4 months in the past.
  WorldMate Traxo AwardWallet
Parsing Expedia e-mail
Itinerary data Parsed
  • Source (Expedia) and Expedia itinerary number
  • Airline code, flight number
  • Airport codes, names and long/lat
  • Departure time in UTC and local time
  • Flight duration
  • Flight class (as text, no codes)
  • Car rental company
  • Car rental confirmation number
  • Car description
  • Address of pickup and dropoff
Not (or wrongly) parsed
  • Car driver was parsed, but is wrong
  • Participants are not present in response
Parsed
  • Source (Expedia)
  • Airline code, flight number
  • Airport codes, names and long/lat
  • Departure time in local time, plus timezone
  • Fare base code and flight class description
  • Both participants
  • Car rental company incl. GDS code
  • Car rental confirmation number
  • Car driver
  • Car description
  • Address of pickup and dropoff
Not (or wrongly) parsed
  • Expedia itinerary number is not present
  • Flight duration is not present
Parsed
  • Source (Expedia) and Expedia itinerary number
  • Airline code, flight number
  • Airport codes, names and long/lat
  • Departure time in local time, plus timezone
  • Flight duration (but in plain text, ie. ‘1 h 55 m’)
  • Fare base code and flight class description
  • Both participants
  • Car rental company
  • Car rental confirmation number
  • Car driver
  • Car description
  • Address of pickup and dropoff
Not (or wrongly) parsed
Price data Currency code ‘USD’ is present in reponse, but amount is missing Not present Present and correct
Parsing e-bookers e-mail
Itinerary data Parsed
  • Source (Expedia) and Expedia itinerary number (no reference to e-bookers)
  • Airline code, flight number
  • Airport codes, names and long/lat
  • Departure time in UTC and local time
  • Flight duration
  • Flight class (as text, no codes)
  • One participant is present (the person who made the booking)
Not (or wrongly) parsed
  • Second participant is missing
Parsed
  • Source (Ebookers)
  • Airline code, flight number
  • Airport codes, names and long/lat
  • Departure time in local time, plus timezone
  • Fare base code and flight class description
  • Both participants
Not (or wrongly) parsed
  • Expedia itinerary number is not present
  • Flight duration is not present
Parsed
  • Source (ebookers) and Expedia itinerary number
  • Airline code, flight number
  • Airport codes, names and long/lat
  • Departure time in local time, plus timezone
  • Flight duration (but in plain text, ie. ‘1 h 55 m’)
  • Fare base code and flight class description
  • Both participants
Not (or wrongly) parsed
Price data Total cost is present and correct (for both participants) Total cost is present and correct (for both participants). Also, taxes are listed separately Total cost is present and correct (for both participants). Also, taxes are listed separately
Parsing KLM e-mail
Itinerary data Parsed
  • Source (KLM) and PNR
  • Airline code, flight number
  • Departure time in local time, plus timezone
  • Fare base code and flight class description
  • Both participants
  • Loyalty program numbers
Not (or wrongly) parsed
  • Airport codes, names and long/lat are missing!
  • Ticket numbers
  • Reserved seat numbers
  • Flight duration
Parsed
  • Source (KLM) and PNR
  • Airline code, flight number
  • Airport codes, names and long/lat are missing!
  • Departure time in local time, plus timezone
  • Flight class description
  • Both participants
Not (or wrongly) parsed
  • Fare base code
  • Reserved seat numbers
  • Flight duration
  • Loyalty program numbers
Parsed
  • Source (KLM) and PNR
  • Airline code, flight number
  • Airport codes, names and long/lat are missing!
  • Departure time in local time, plus timezone
  • Flight class description
  • Both participants
Not (or wrongly) parsed
  • Fare base code
  • Reserved seat numbers
  • Flight duration
  • Loyalty program numbers
Price data ‘Total cost’ is parsed, which is only flight cost. Additional amount for seat reservations is omitted. ‘Total cost’ is parsed including the additional amount for seat reservations. ‘Total cost’ is parsed including the additional amount for seat reservations.
Other
API / Tooling WorldMate only accepts e-mails via SMTP, there is no API to send a MIME message. The parsing result is delivered via callback webhook. The parsing result is delivered via callback webhook. The parsing result is delivered via callback webhook.

AwardWallet also offers mailbox scanning.

Format XML JSON JSON

Support Only offered to premium-tier customers. However, an inquiry was answered within 24h. A support request was resolved within the business day. No support was needed, maybe that’s an indicator in itself.
Price 1,000 calls/emails per week are free. The price for higher volume is not disclosed. 25 free parses per rolling 24-hour period are free. The commercial terms are not disclosed. From their website: For pricing information or if you wish to set up an account to test the API, please contact us

Conclusion

WorldMate: The WorldMate KLM confirmation is noteworthy in that it’s entirely missing departure and arrival airports. The exact same PNR, but as flight plan e-mail (basically the Amadeus itinerary printout), parses fine (including seat and ticket numbers).

Traxo: When sending the PNR Amadeus itinerary to Traxo, the airport codes for ZRH and GLA were missing in the response.

AwardWallet: Shows the best and most consistent results.


In the process of developing funnel.travel, a corporate post-booking travel management tool, I’m sharing some hopefully useful insights into Angular 6, Spring Boot, jOOQ, or any other technology we’ll be using.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s