#include <CAcInvertedFile.h>
Public Member Functions | |
virtual bool | operator() () const =0 |
for testing if the inverted file is correctly constructed | |
virtual string | IDToURL (TID inID) const =0 |
<HIER-WIRDS-INTERESSANT-> | |
virtual pair< bool, TID > | URLToID (const string &inURL) const =0 |
Translate an URL to its document ID. | |
virtual list< TID > * | getAllFeatureIDs () const =0 |
Getting a list of all features contained in this. | |
bool | operator() () const |
for testing if the inverted file is correctly constructed | |
CAcInvertedFile (const CXMLElement &inCollectionElement) | |
This opens an exsisting inverted file, and then inits this structure. | |
bool | init (bool) |
called by constructors | |
~CAcInvertedFile () | |
Destructor. | |
string | IDToURL (TID inID) const |
Translate a DocumentID to a URL (for output). | |
TID | URLToID (const string &inURL) const |
Translate an URL to its document ID. | |
TID | getMaximumFeatureID () const |
This is interesting for browsing. | |
list< TID > * | getAllFeatureIDs () const |
Getting a list of all features contained in this. | |
The proper inverted file access | |
virtual CDocumentFrequencyList * | FeatureToList (TFeatureID inFID) const =0 |
Give the List of documents containing the feature inFID. | |
virtual CDocumentFrequencyList * | URLToFeatureList (string inURL) const =0 |
List of features contained by a document with URL inURL. | |
virtual CDocumentFrequencyList * | DIDToFeatureList (TID inDID) const =0 |
List of features contained by a document with ID inDID. | |
Accessing information about features | |
</HIER-WIRDS-INTERESSANT-> For drawing random sets mainly a translation of getRandomURLs CORNELIA: IM MOMENT KOENNEN WIR DIESE FUNKTION VERGESSEN
virtual void getRandomRLLs(list<string>&, list<string>::size_type)const; | |
virtual double | FeatureToCollectionFrequency (TFeatureID) const =0 |
Collection frequency for a given feature. | |
virtual unsigned int | getFeatureDescription (TID inFeatureID) const =0 |
What kind of feature is the feature with ID inFeatureID? | |
Accessing additional document information | |
virtual double | DIDToMaxDocumentFrequency (TID) const =0 |
returns the maximum document frequency for one document ID | |
virtual double | DIDToDFSquareSum (TID) const =0 |
Returns the document-frequency square sum for a given document ID. | |
virtual double | DIDToSquareDFLogICFSum (TID) const =0 |
Returns this function for a given document ID. | |
virtual bool | generateInvertedFile ()=0 |
Generating an inverted File, if there is none. | |
virtual bool | checkConsistency ()=0 |
Check the consistency of the inverted file system accessed by this accessor. | |
The proper inverted file access | |
CDocumentFrequencyList * | FeatureToList (TFeatureID) const |
List of documents containing the feature. | |
CDocumentFrequencyList * | URLToFeatureList (string inURL) const |
List of features contained by a document. | |
CDocumentFrequencyList * | DIDToFeatureList (TID inDID) const |
List of features contained by a document with ID inDID. | |
Accessing information about features | |
double | FeatureToCollectionFrequency (TFeatureID) const |
Collection frequency for a given feature. | |
unsigned int | getFeatureDescription (TID inFeatureID) const |
What kind of feature is the feature with ID inFeatureID? | |
Accessing additional document information | |
double | DIDToMaxDocumentFrequency (TID) const |
returns the maximum document frequency for one document ID | |
double | DIDToDFSquareSum (TID) const |
Returns the document-frequency square sum for a given document ID. | |
double | DIDToSquareDFLogICFSum (TID) const |
Returns this function for a given document ID. | |
bool | generateInvertedFile () |
Generating an inverted File, if there is none. | |
bool | newGenerateInvertedFile () |
Generating an inverted File, if there is none. | |
bool | checkConsistency () |
Check the consistency of the inverted file system accessed by this accessor. | |
bool | findWithinStream (TID inFeatureID, TID inDocumentID, double inDocumentFrequency) const |
Is the Document with inDocumentID contained in the document frequency list of the feature inFeatureID and is the associated document frequency the same? | |
Protected Types | |
typedef hash_map< TID, unsigned int > | CIDToOffset |
map from feature id to the offset for this feature | |
Protected Member Functions | |
void | writeOffsetFileElement (TID inFeatureID, int inPosition, ostream &inOpenOffsetFile) |
add a pair of FeatureID,Offset to the open offset file (helper function for inverted file construction) | |
CDocumentFrequencyList * | getFeatureFile (string inFileName) const |
loads a *.fts file. | |
Protected Attributes | |
TID | mMaximumFeatureID |
the maximum feature ID arising in this file | |
CArraySelfDestroyPointer< char > | mInvertedFileBuffer |
A buffer, if the inverted file is to be held in ram. | |
CSelfDestroyPointer< istream > | mInvertedFile |
The inverted file. | |
ifstream | mOffsetFile |
Feature -> Offset in inverted file. | |
ifstream | mFeatureDescriptionFile |
File of feature descriptions. | |
string | mInvertedFileName |
Name of the inverted file. | |
string | mOffsetFileName |
Name of the Offset file. | |
string | mFeatureDescriptionFileName |
Name for the file with the feature description. | |
CIDToOffset | mIDToOffset |
map from feature id to the offset for this feature | |
hash_map< TID, double > | mFeatureToCollectionFrequency |
map from feature to the collection frequency | |
for fast access... | |
hash_map< TID, unsigned int > | mFeatureDescription |
map from the feature ID to the feature description | |
CADIHash | mDocumentInformation |
additional information about the document like, e.g. |
This access is done "by hand" at present this not really efficient, however we plan to move to memory mapped files.
CAcInvertedFile::CAcInvertedFile | ( | const CXMLElement & | inCollectionElement | ) |
This opens an exsisting inverted file, and then inits this structure.
After that it is fully usable
As a paramter it takes an XMLElement which contains a "collection" element and its content.
If the attribute vi-generate-inverted-file is true, then a new inverted file will be generated using the parameters given in inCollectionElement. you will NOT be able to use *this afterwards.
The REAL constructor.
virtual string CAcInvertedFile::IDToURL | ( | TID | inID | ) | const [pure virtual] |
<HIER-WIRDS-INTERESSANT->
Translate a DocumentID to a URL (for output)
Implements CAccessor.
Implemented in CAcIFFileSystem.
virtual CDocumentFrequencyList* CAcInvertedFile::FeatureToList | ( | TFeatureID | inFID | ) | const [pure virtual] |
Give the List of documents containing the feature inFID.
CORNELIA: CDocumentFrequencyList ist nichts anderes als eine liste von
int,float paaren:
struct{ int mID, float mFrequency; }
Implemented in CAcIFFileSystem.
virtual bool CAcInvertedFile::checkConsistency | ( | ) | [pure virtual] |
Check the consistency of the inverted file system accessed by this accessor.
Implemented in CAcIFFileSystem.
virtual list<TID>* CAcInvertedFile::getAllFeatureIDs | ( | ) | const [pure virtual] |
Getting a list of all features contained in this.
This function is necessary, because in the present system only about 50 percent of the features are really used.
A feature is considered used if it arises in at least one image
Implemented in CAcIFFileSystem.
CDocumentFrequencyList* CAcInvertedFile::getFeatureFile | ( | string | inFileName | ) | const [protected] |
bool CAcInvertedFile::generateInvertedFile | ( | ) |
Generating an inverted File, if there is none.
Fast but stupid in-memory method. This method is very fast, if all the inverted file (and a bit more) can be kept in memory at runtime. If this is not the case, extensive swapping is the result, virtually halting the inverted file creation.
Reimplemented in CAcIFFileSystem.
bool CAcInvertedFile::newGenerateInvertedFile | ( | ) |
Generating an inverted File, if there is none.
Employing the two-way-merge method described in "managing gigabytes", chapter 5.2. Sort-based inversion. (Page 181)
Reimplemented in CAcIFFileSystem.
bool CAcInvertedFile::checkConsistency | ( | ) |
Check the consistency of the inverted file system accessed by this accessor.
Reimplemented in CAcIFFileSystem.
list<TID>* CAcInvertedFile::getAllFeatureIDs | ( | ) | const |
Getting a list of all features contained in this.
This function is necessary, because in the present system only about 50 percent of the features are really used.
A feature is considered used if it arises in mIDToOffset.
Reimplemented in CAcIFFileSystem.
CADIHash CAcInvertedFile::mDocumentInformation [protected] |
additional information about the document like, e.g.
the euclidean length of the feature list.
Reimplemented in CAcIFFileSystem.