API Reference

CacheEntry

class metaindex.CacheEntry(path, metadata=None, last_modified=None)

A cached metadata entry for an item in the filesystem

add(key, value)

Add metadata key:value to this entry

This does not update the underlying persistence layer.

You can set the last_modified property through this, too. But there will always only be one value and it can only be changed forward in time. If you need to set the value of last_modified to something that’s earlier than the current value, just set the property directly instead of calling .add.

delete(keyvalue)

Delete the metadata key/value or key

Parameters

keyvalue (str (the key) or a tuple/list of (key, value)) – key of the metadata values to delete or key/value pair to delete

ensure_last_modified(force=False)

Ensure that the last_modified value is set.

If it is not set, the last modified datetime will be obtained by querying the filesystem, which may be slow and fail.

This function call will not fail though: in case of exceptions on the filesystem level the last_modified date will simply be set to datetime.datetime.min.

Parameters

force – Whether or not to enforce an update even if the entry has a valid value.

Returns

Updated last_modified value.

get(key)

Get the list of MetadataValue of key

Parameters

key – The key to look for

Returns

A list of MetadataValue, may be empty.

keys()

Return all metadata keys

last_modified

The time stamp of when the file was modified most recently

metadata

The dictionary of lower-case keys, mapping to a list of MetadataValue.

path

The path to the file object in the filesystem. Does not need to exist.

pop(key)

Remove all entries of key from the metadata and return them

After the operation self[key] will return only an empty list.

update(other, accept_duplicates=False)

Add metadata from other into this entry

Will not add duplicate key:value pairs unless accept_duplicates is set to True.

This call will also update the last_modified property to be the most recent of both self and other.

Query

class metaindex.Query(root=None)

Represents a search query to obtain entries from the cache

as_sql()

Build an SQLite query

Returns a tuple (query, args) to be passed to execute of an sqlite database.

classmethod parse(text, synonyms=None)

Accepts a human-written search term and builds a Query from it

Cache

class metaindex.MemoryCache(config)

Version of the threaded cache that uses in-memory caching of the database

Upon initialisation this cache will obtain all entries from the database and try to keep the data in memory up to date.

BACKEND_TYPE

alias of metaindex.cache.ThreadedCache

bulk_insert(inserts)

Insert a whole set of files and their metadata

inserts is a list of (path, metadata, last_modified) tuples, just like the parameters that you would use for ‘insert’.

bulk_rename(renames)

Run the rename operation for all renames

See also below, rename.

renames must be a tuple (old_path, new_path). old_path can be a directory, moving all cached subentries accordingly, too.

You may add a is_dir boolean value to prevent the lookup of whether or not old_path is a directory, i.e. (old_path, new_path, is_dir). Use it if you first rename the path and then update the cache.

check_dirty()

Check whether or not the database is more recent than this cache

This call will return immediately.

This call may return False even if it doesn’t know for sure that the cache is dirty.

However, if it returns True, you can be sure that the cache is dirty (i.e. no longer valid).

cleanup()

Find and remove all entries in the cache that refer to no longer existing files

clear()

Remove everything from the cache

do_reload()

A blocking reload function.

Do not call this directly, but call ‘invalidate’ or ‘reload’ instead.

do_rename(renames)

A blocking rename function.

Do not call this directly, but call ‘rename’ instead.

expire_metadata(paths)

Remove all metadata associated to these paths

But keep the paths in the database.

find(query)

Find all entries that match this query

find_indexable_files(paths, recursive=True)

Find all files that can be indexed in the given paths

Parameters
  • paths – A list of paths to search through

  • recursive – Whether or not any given directories should be indexed recursively

forget(paths)

Remove all paths from the cache.

Parameters

paths – The list of paths to remove from cache.

get(paths)

Get all entries for these paths (recursively)

insert(item)

Insert the CacheEntry entry into the cache.

This operation will not modify the item in the filesystem nor update any other form of metadata persistency for the item. This function really only affects the cache.

invalidate()

Invalidate the cached entries and reload

is_busy()

Whether or not the cache is busy reading from the database

keys()

Returns a set of all known metadata keys.

last_modified()

Return the date and time of the entry that most recently updated in the database. :rtype: datetime.datetime

quit()

End the cache thread

refresh(paths, recursive=True, processes=None)

(Re-)index these paths (recursively by default)

reload()

Reload the data from the cache

rename(path, new_path, is_dir=None)

Move all metadata entries for ‘path’ to ‘new_path’.

Convenience wrapper for bulk_rename.

start()

Call this before attempting any queries.

wait_for_reload()

Wait until any currently pending reload operation is completed

wait_for_write()

Wait until any currently pending write operation is completed

Indexers

class metaindex.indexer.IndexerBase(cache)

Base class for all file indexers

When adding an indexer to metaindex, you should sublass from this.

Make sure to define the class propertes NAME, ACCEPT, and PREFIX.

You can control when the indexer should be run (compared to others for the same file type), by defining the ORDER class property.

ACCEPT = []

Specify what suffices or mimetypes are handled by this indexer, e.g. ACCEPT = ['.rst', '.md', 'text/html', 'image/'] anything starting with a . is assumed to be a suffix, everything else is assumed to be a mimetype.

If the mimetype ends with /, it is matched against the first part of the file’s mimetype.

If your indexer should run for all files, use ACCEPT = '*' You must declare the prefix that is used for tag names of this indexer This may also be a tuple (or otherwise iterable) of prefixes.

NAME = None

The name by which this indexer is registered

ORDER = 500

When to execute this indexer in the order of indexers

PREFIX = None

What prefix a tag created by this indexer should receive.

static cached_dt(last_cached)

Return the datetime when this cache entry was created

changed_since_cached(path, last_cached)

Return True if the file at path has changed since it was cached last (according to last_cached)

reuse_cached(metadata, last_cached)

Updates metadata with the entries from last_cached for entries of this indexer

The idea is that you can call return self.reuse_cached(metadata, last_cached) from run(...) if you are skipping the execution of the indexer because nothing changed since the last time it was run.

This function is used, for example, from @only_if_changed.

run(path, metadata, last_cached)

Execute this Indexer to run on the file at path.

Parameters
  • path – will be of type pathlib.Path.

  • metadata – is the accumulated metadata already collected by other indexers that ran before this one. You are expected to add your metadata keys and values in here.

  • last_cached – is the metadata from when this file was entered into the cache the last time. You could use this information to skip indexing, e.g. when the last_modified of the file is older than the cached entry.

Consider using the @only_if_changed decorator if you want this indexer to only be run if the file has changed since the last run of any indexer.

Be aware that each subprocess will create their own instance of your Indexer; if you have a cache, it will be different between the processes.

Humanizers

metaindex.humanizer.register_humanizer(tags, priority=Priority.NORMAL)

Decorator to register a function as a humanizer

tags can be a single tags, or a set of tags to which this humanizer should be applied.

A tag may be the exact tag, like ‘general.size’, or ignoring the prefix with ‘.size’. You may also set the tag to be ‘’, in case you want to translate rather by type; consider setting the priority to LOW in that case though.

priority specifies how early in the process this humanizer should be called.

The humanize function must accept a value and return either the human-readable version in form of a string or None, if the value can not be translated.

class metaindex.humanizer.Priority(value)

Priority of humanizers

HIGH = 20

High priority

HIGHEST = 10

Highest priority, humanizers with this priority will be run first

LOW = 80

Low priority

LOWEST = 90

Lowest priority, humanizers with this priority will be run last

NORMAL = 50

Normal priority. This is the default for custom humanizers

SqlAccess

class metaindex.sql.SqlAccess(uri)

Basic SQLite wrapper

expire_metadata(paths)

Mark the metadata of these paths as expired

Expiring does not remove the data from the database, only marks it as very, very old.

If a paths is pointing to a directory instead of a file, all metadata of all items in all the subdirectories will be expired, too.

If paths is None, all metadata will be expired.

Parameters

paths – the paths to expire, None will expire all metadata

files()

Get all paths of files in the database

Returns

List of all file paths cached in the database

Return type

list[pathlib.Path]

find(query)

Find and return all entries that match the query

Parameters

query (metaindex.query.Query) – the search query to run

Return type

list[shared.CacheEntry]

flush()

Flush the entire database

get(paths)

Get the metadata of these paths

Parameters

paths – list of paths to query

Return type

list[shared.CacheEntry]

insert(items)

Insert items in the database

This will overwrite all existing entries (by path) in the database.

Parameters

items – A list of CacheEntry to insert

Returns

Number of inserted entries

keys()

Returns a set of all known metadata keys.

last_modified()

Get the timestamp of the most recently modified file in the database

Return type

datetime.datetime

purge(paths)

Remove all entries with these paths from the database

If a path points to a directory instead of a file, all files in all subdirectories will be removed, too.

Parameters

files – List of paths

rename_dir(old_path, new_path)

Rename a directory from old_path to new_path

Only affects the database, no directories on the filesystem are renamed

rename_file(old_path, new_path)

Rename a file from old_path to new_path

Only affects the database, no files on the filesystem are renamed