Today I met this. It sounds interesting for our MIOSARM .
Aperture Framework
Aperture: a Java framework for getting data and metadata
Features
- Crawl information systems such as file systems, websites, mail boxes and mail servers
- Extract full-text and metadata from many common file formats
- View files in their native applications
- Ease of use: easy to learn, easy to code, easy to deploy in industrial projects
- Flexible architecture: can be extended with custom file formats, data sources, etc., with support for deployment on OSGi platforms
- Data exchange based on Semantic Web standards (e.g. RDF, SPARQL, ...)
- Plain text
- HTML
- XML
- PDF (Portable Document Format)
- RTF (Rich Text Format)
- Microsoft Word 97+
- Microsoft Excel 97+
- Microsoft Powerpoint 97+
- Microsoft Works
- OpenOffice 1.x: Writer, Calc, Impress, Draw
- StarOffice 6.x - 7.x+: Writer, Calc, Impress, Draw
- OpenDocument (OpenOffice 2.x, StarOffice 8.x)
- WordPerfect
- Emails (.eml files)
Crawlers support the extraction of information from heterogenous data source. At the moment we support the following source types:
- File Systems (local, remote, removeable media)
- Websites and intranets
- IMAP e-mail servers
- Microsoft Outlook
Aperture Framework
Nessun commento:
Posta un commento