System documentation of the GNU Image-Finding Tool

CAcIFFileSystem Class Reference

An accessor to an inverted file. More...

#include <CAcIFFileSystem.h>

Inheritance diagram for CAcIFFileSystem:

CAcInvertedFile CAcURL2FTS CAccessor CAccessorImplementation CAccessor

List of all members.

Public Member Functions

bool operator() () const
 for testing if the inverted file is correctly constructed
 CAcIFFileSystem (const CXMLElement &inCollectionElement)
 This opens an exsisting inverted file, and then inits this structure.
bool init (bool)
 called by constructors
 ~CAcIFFileSystem ()
 Destructor.
string IDToURL (TID inID) const
 Translate a DocumentID to a URL (for output).
virtual pair< bool, TID > URLToID (const string &inURL) const
 Translate an URL to its document ID.
void getAllIDs (list< TID > &) const
 List of the IDs of all documents present in the inverted file.
void getAllAccessorElements (list< CAccessorElement > &) const
 List of triplets (ID,imageURL,thumbnailURL) of all the documents present in the inverted file.
void getRandomIDs (list< TID > &, list< TID >::size_type) const
 get a given number of random C-AccessorElement-s
void getRandomAccessorElements (list< CAccessorElement > &outResult, list< CAccessorElement >::size_type inSize) const
 For drawing random sets.
int size () const
 The number of images in this accessor.
TID getMaximumFeatureID () const
 This is interesting for browsing.
list< TID > * getAllFeatureIDs () const
 Getting a list of all features contained in this.
virtual pair< bool,
CAccessorElement
IDToAccessorElement (TID inID) const
 Translate a DocumentID to an accessor Element.
 operator bool () const
 is this well constructed?
The proper inverted file access
CDocumentFrequencyListFeatureToList (TFeatureID) const
 List of documents containing the feature.
CDocumentFrequencyListURLToFeatureList (string inURL) const
 List of features contained by a document.
CDocumentFrequencyListDIDToFeatureList (TID inDID) const
 List of features contained by a document with ID inDID.
Accessing information about features
double FeatureToCollectionFrequency (TFeatureID) const
 Collection frequency for a given feature.
unsigned int getFeatureDescription (TID inFeatureID) const
 What kind of feature is the feature with ID inFeatureID?
Accessing additional document information
double DIDToMaxDocumentFrequency (TID) const
 returns the maximum document frequency for one document ID
double DIDToDFSquareSum (TID) const
 Returns the document-frequency square sum for a given document ID.
double DIDToSquareDFLogICFSum (TID) const
 Returns this function for a given document ID.
bool generateInvertedFile ()
 Generating an inverted File, if there is none.
bool newGenerateInvertedFile ()
 Generating an inverted File, if there is none.
bool checkConsistency ()
 Check the consistency of the inverted file system accessed by this accessor.
bool findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const
 Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?

Protected Types

typedef HASH_MAP< TID, streampos > CIDToOffset
 map from feature id to the offset for this feature

Protected Member Functions

void writeOffsetFileElement (TID inFeatureID, streampos inPosition, ostream &inOpenOffsetFile)
 add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction)
CDocumentFrequencyListgetFeatureFile (string inFileName) const
 loads a *.fts file.

Protected Attributes

CMutex mMutex
 the mutex for multi threading
CSelfDestroyPointer< CAcURL2FTSmURL2FTS
 In order to have just one parent, I have to limit on single inheritance.
TID mMaximumFeatureID
 the maximum feature ID arising in this file
string mInvertedFileBuffer
 A buffer, if the inverted file is to be held in ram.
string mTemporaryIndexingFileBase
 Some place for putting temporary indexing data.
CSelfDestroyPointer< istream > mInvertedFile
 The inverted file.
ifstream mOffsetFile
 Feature -> Offset in inverted file.
ifstream mFeatureDescriptionFile
 File of feature descriptions.
string mInvertedFileName
 Name of the inverted file.
string mOffsetFileName
 Name of the Offset file.
string mFeatureDescriptionFileName
 Name for the file with the feature description.
CIDToOffset mIDToOffset
 map from feature id to the offset for this feature
HASH_MAP< TID, double > mFeatureToCollectionFrequency
 map from feature to the collection frequency
for fast access...
HASH_MAP< TID, unsigned int > mFeatureDescription
 map from the feature ID to the feature description
CADIHash mDocumentInformation
 additional information about the document like, e.g.


Detailed Description

An accessor to an inverted file.

This access is done "by hand".

For a long time we wanted to move to memory mapped files (like SWISH++) but currently I think this is not the best idea.


Constructor & Destructor Documentation

CAcIFFileSystem::CAcIFFileSystem ( const CXMLElement inCollectionElement  ) 

This opens an exsisting inverted file, and then inits this structure.

After that it is fully usable

As a paramter it takes an XMLElement which contains a "collection" element and its content.

If the attribute cui-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.

Like every accessor, this accessor takes a <collection> MRML element as input (

See also:
CXMLElement for how to access the attributes of this element). Currently this accessor understands the following attributes
cui-base-dir: the directory containing the following files cui-inverted-file-location: the location of the inverted file cui-offset-file-location: a file containing offsets into the inverted file cui-feature-file-location: the location of the "url2fts" file which translates urls to feature file names.


Member Function Documentation

CDocumentFrequencyList* CAcIFFileSystem::getFeatureFile ( string  inFileName  )  const [protected]

loads a *.fts file.

and returns the feature list

Reimplemented from CAcInvertedFile.

bool CAcIFFileSystem::generateInvertedFile (  )  [virtual]

Generating an inverted File, if there is none.

Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.

Implements CAcInvertedFile.

bool CAcIFFileSystem::newGenerateInvertedFile (  ) 

Generating an inverted File, if there is none.

Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)

Reimplemented from CAcInvertedFile.

bool CAcIFFileSystem::checkConsistency (  )  [virtual]

Check the consistency of the inverted file system accessed by this accessor.

Implements CAcInvertedFile.

bool CAcIFFileSystem::findWithinStream ( TID  inFeatureID,
TID  inDocumentID,
double  inDocumentFrequency 
) const

Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same?

Parameters:
inFeature<id the

Reimplemented from CAcInvertedFile.

virtual pair<bool,TID> CAcIFFileSystem::URLToID ( const string &  inURL  )  const [virtual]

Translate an URL to its document ID.

Implements CAcInvertedFile.

void CAcIFFileSystem::getRandomIDs ( list< TID > &  ,
list< TID >::size_type   
) const [virtual]

get a given number of random C-AccessorElement-s

Parameters:
inoutResultList the list which will contain the result
inSize the desired size of the inoutResultList

Implements CAccessor.

void CAcIFFileSystem::getRandomAccessorElements ( list< CAccessorElement > &  outResult,
list< CAccessorElement >::size_type  inSize 
) const [virtual]

For drawing random sets.

Why is this part of an CAccessorImplementation? The way the accessor is organised might influence the way random sets can be drawn. At present everything happens in RAM, but we do not want to be fixed on that.

Parameters:
inoutResultList the list which will contain the result
inSize the desired size of the inoutResultList

Implements CAccessor.

list<TID>* CAcIFFileSystem::getAllFeatureIDs (  )  const [virtual]

Getting a list of all features contained in this.

This function is necessary, because in the present system only about 50 percent of the features are really used.

A feature is considered used if it arises in mIDToOffset.

Implements CAcInvertedFile.

virtual pair<bool,CAccessorElement> CAcIFFileSystem::IDToAccessorElement ( TID  inID  )  const [virtual]

Translate a DocumentID to an accessor Element.

Implements CAccessor.


Member Data Documentation

In order to have just one parent, I have to limit on single inheritance.

I cannot use virtual base classes, because then I cannot downcast

additional information about the document like, e.g.

the euclidean length of the feature list.

Reimplemented from CAcInvertedFile.


The documentation for this class was generated from the following file:

Need for discussion? Want to contribute? Contact
help-gift@gnu.org Generated using Doxygen