Extracting email addresses from Outlook
The cyotek.com receives an awful lot of spam and a lot of this is sent to email addresses that don't exist. However, as we currently have catch all's enabled, it means we receive it regardless. This is compounded by the fact that I tend to create a unique email address for each website or service I interact with. And it's impossible to remember them all!
As a first step to deleting the catch alls, I wanted to see how many unique @cyotek.com addresses were in use. The simplest way of picking up these would be scanning PST files - we have email going back to 2002 in these files, and there's the odd backup elsewhere going back even further. Last time I used OLE Automation with Outlook was back in the days of VB6 and I recall well getting plagued with permission dialogs each time I dreamed of trying to access the API. Still, I thought I'd take a look.
Setting up
Note: I tested this project on an Outlook profile which has loaded a primary PST, an archive PST, and a Gmail account. I haven't tested this with any other type of account (for example Exchange) or with accounts using non-SMTP email addresses. Caveat emptor!
The first thing to do is add a reference to the Outlook COM objects. I have VS2010 and VS2012 installed on this machine, and one of them has installed a bunch of prepared Office Interop DLL's into the GAC. Handy, I won't have to create my own! Adding a reference to the Microsoft Outlook 14.0 Object Library added three references, Microsoft.Office.Interop.Outlook.dll, Office.dll and stdole to my project.
Note: Depending on your version of VS / .NET Framework, the references may have a property named Embed Interop Types which defaults to
true
. When left at this, you may have problems debugging as you won't be able to access the objects properly through the Immediate window, instead getting an error similar to"Member 'To' on embedded interop type 'Microsoft.Office.Interop.Outlook.MailItem' cannot be evaluated while debugging since it is never referenced in the program. Consider casting the source object to type 'dynamic' first or building with the 'Embed Interop Types' property set to false when debugging"
Probably a good idea to set this to false before debugging your code!
Connecting to Outlook
All the code below assumes that you have a
using Microsoft.Office.Interop.Outlook;
statement at the top of your code file.
Connecting to Outlook is easy enough, just create a new instance of the Application interface. We'll use as a root for everything else.
Application application;
application = new Application();
Remember I mentioned permission dialogs? Older versions of Outlook used to prompt for permissions. Outlook 2010 just seems to quietly get on with things. The only thing I've noticed is that if you try and create a new
Application
when Outlook isn't currently running, it will be silently started and the system tray icon will have a slightly different icon and a tooltip informing that some other program is using Outlook. Much nicer than previous behaviours!
Getting Account Folders
The Session
property of the Application
interface returns a
NameSpace
that details your Outlook setup, and allows access
to accounts, profile details etc. However, for this project, the
only thing I care about is the Folders
property which returns
a collection of MAPIFolder
objects. In my case, it was the
three top level folders for my profile - I was somewhat
surprised that the Gmail account was loaded actually.
Now that we have a folder, we can scan it by enumerating the
Items
property. As Outlook folders can contain items of
various types, you need to check the item type - I'm looking for
MailItem
objects in order to extract those addresses.
Pulling out email addresses
Each MailItem
has Sender
, To
and Recipients
properties.
To
seems to be just a string version of Recipients
and so
shall be completely ignored - why bother parsing it manually
when Recipients
already does it for you. The Sender
property
returns an AddressEntry
, and each item in the Recipients
collection (a Recipient
) offers an AddressEntry
property. So
we're all set!
The following code snippet is from the example project, and
basically shows how I scan a source MAPIFolder
looking for
MailItem
objects.
protected virtual void ScanFolder(MAPIFolder folder)
{
this.CurrentFolderIndex++;
this.OnFolderScanning(new MAPIFolderEventArgs(folder, this.FolderCount, this.CurrentFolderIndex));
// items
foreach (object item in folder.Items)
{
if (item is MailItem)
{
MailItem email;
email = (MailItem)item;
// add the sender of the email
if (this.Options.HasFlag(Options.Sender))
this.ProcessAddress(email.Sender);
// add the recipies of the email
if (this.Options.HasFlag(Options.Recipient))
{
foreach (Recipient recipient in email.Recipients)
this.ProcessAddress(recipient.AddressEntry);
}
}
}
// sub folders
if (this.Options.HasFlag(Options.SubFolders))
{
foreach (MAPIFolder childFolder in folder.Folders)
this.ScanFolder(childFolder);
}
}
When I find an AddressEntry
to process, I call the following
functions:
protected virtual void ProcessAddress(AddressEntry addressEntry)
{
if (addressEntry != null && (addressEntry.AddressEntryUserType == OlAddressEntryUserType.olSmtpAddressEntry || addressEntry.AddressEntryUserType == OlAddressEntryUserType.olOutlookContactAddressEntry))
this.ProcessAddress(addressEntry.Address);
else if (addressEntry != null)
Debug.Print("Unknown address type: {0} ({1})", addressEntry.AddressEntryUserType, addressEntry.Address);
}
protected virtual void ProcessAddress(string emailAddress)
{
int domainStartPosition;
domainStartPosition = emailAddress.IndexOf("@");
if (!string.IsNullOrEmpty(emailAddress) && domainStartPosition != -1)
{
bool canAdd;
if (this.Options.HasFlag(Options.FilterByDomain))
canAdd = this.IncludedDomains.Contains(emailAddress.Substring(domainStartPosition + 1));
else
canAdd = true;
if (canAdd)
this.EmailAddresses.Add(emailAddress);
}
}
Although I'm scanning my entire PST, I don't want every single email address in there - I ran it once and it brought back just over 5000 addresses. What I want, is addresses tied to the domains I own, so I added some filtering for this. With this filtering enabled it returned a more manageable 497 unique addresses. Although I'm not creating 497 aliases on the email server!
Wrapping up
This is a lot easier than what I was expecting, and in fact this is probably the smoothest piece of COM interop I've done with .NET yet. No strange errors, no forced to compile in 32bit mode, It Just Works.
You can find the example project in the link below.
Update History
- 2012-09-26 - First published
- 2020-11-21 - Updated formatting
Downloads
Filename | Description | Version | Release Date | |
---|---|---|---|---|
OutlookEmailAddressExtract.zip
|
Outlook Email Address Extrator, a sample C# application that scans an Outlook profile and pulls out email addresses |
1.0.0.0 | 26/09/2012 | Download |
Leave a Comment
While we appreciate comments from our users, please follow our posting guidelines. Have you tried the Cyotek Forums for support from Cyotek and the community?