A simple component to extract just the text from any file that has an IFilter installed. Available as a C++ COM component and as a C# .NET library.


Project Home Page
IFilter is a COM component and .Net test jig that uses installed IFilter providers to extract the text from any file. The providers for the various formats are available from most vendors as well as a couple third-party providers. The IFilter providers are used by Microsoft Index Server, Microsoft Sharepoint Server and Microsoft Desktop Search to extract the indexable text for a file. By using the same interfaces, it is possible to extract just the text (less formatting) from just about any file from Microsoft Word .DOC files to .MP3 files.

  • more to come
Last edited Jun 22 2007 at 5:44 PM by IDisposable, version 4
Comments
mortench wrote  Jan 14 2007 at 9:16 PM 
Nice tool, except that it does not work on my Windows XP 32bit. I have downloaded the Debug binaries and when ever I try the tester app, it gives an error when I open a file (regardless of file, type). The error msg is : "Error extracting text from ..... Err=429-Retrieving the COM class factory for component with CLSID {E5070c86-c142-b17b-5aa76cba3bf2} FAILED DURE TO THE FOLLOWING ERROR: 80040154. in TextExtractTester"

psaltr wrote  May 12 2008 at 2:20 AM 
if you register the ExtractText.dll, this error message should go away.
Just open a command prompt, navigate to the place that you've extracted the debug binaries to, and type regsvr32 ExtractText.dll

Updating...
© 2006-2009 Microsoft | About CodePlex | Privacy Statement | Terms of Use | Code of Conduct | CodePlex Blog | Version 2008.12.9.14291