venerdì, novembre 11, 2005

Aperture: a Java framework for getting data and metadata

Today I met this. It sounds interesting for our MIOSARM .

Aperture: a Java framework for getting data and metadata

Aperture Framework

Features

  • Crawl information systems such as file systems, websites, mail boxes and mail servers
  • Extract full-text and metadata from many common file formats
  • View files in their native applications
  • Ease of use: easy to learn, easy to code, easy to deploy in industrial projects
  • Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms
  • Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)
Supported File Formats
  • Plain text
  • HTML
  • XML
  • PDF (Portable Document Format)
  • RTF (Rich Text Format)
  • Microsoft Word 97+
  • Microsoft Excel 97+
  • Microsoft Powerpoint 97+
  • Microsoft Works
  • OpenOffice 1.x: Writer, Calc, Impress, Draw
  • StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
  • OpenDocument (OpenOffice 2.x, StarOffice 8.x)
  • WordPerfect
  • Emails (.eml files)
Crawlers

Crawlers support the extraction of information from heterogenous data source. At the moment we support the following source types:

  • File Systems (local, remote, removeable media)
  • Websites and intranets
  • IMAP e-mail servers
  • Microsoft Outlook

Aperture Framework