Microsoft Outlook Crawler

This package holds a crawler for Microsoft Outlook. It extracts the information from Outlook and represents it using our Data-Schema plus ICAL and VCARD as RDF. To use the Crawler, you have to:

RDF Representation of Outlook

The nifty formats we use for Outlook can be found in the doc/ontology folder. we used:

If we make extensions, the Aperture Vocabulary is the right place to expand. ICAL and VCARD are pretty fixed and should not be extended.

Known Bugs

The outlook nag screen will popup and we cannot remove that. Even using Outlook Redemption, which removes this nag screen for many issues does not help when reading the e-mail adresses of e-mail recipients. It just doesn't work.

Not all data of outlook is available. Repeating events are only limited extracted, many other data items are missing. This is partly because we don't have the time to code all that. You are welcome to improve that!

Threading and Memory leaks. We see that when crawling Outlook data, Java and Outlook memory consumption rises significantly. Both may hang. Although we close all our handles, something is wrong. Also, do not use the crawler in more than one thread. Seriously, multithreading over a Java-Com bridge is no fun.