Onegov File API

Models

class onegov.file.models.file.UploadedFileField(filters=(), upload_type=<class 'depot.fields.upload.UploadedFile'>, upload_storage=None, *args, **kw)[source]

A customized version of Depot’s uploaded file field. This version stores its data in a JSONB field, instead of using text.

load_dialect_impl(dialect)[source]

Return a TypeEngine object corresponding to a dialect.

This is an end-user override hook that can be used to provide differing types depending on the given dialect. It is used by the TypeDecorator implementation of type_engine() to help determine what type should ultimately be returned for a given TypeDecorator.

By default returns self.impl.

process_bind_param(value, dialect)[source]

Receive a bound parameter value to be converted.

Subclasses override this method to return the value that should be passed along to the underlying TypeEngine object, and from there to the DBAPI execute() method.

The operation could be anything desired to perform custom behavior, such as transforming or serializing data. This could also be used as a hook for validating logic.

This operation should be designed with the reverse operation in mind, which would be the process_result_value method of this class.

Parameters
  • value – Data to operate upon, of any type expected by this method in the subclass. Can be None.

  • dialect – the Dialect in use.

process_result_value(value, dialect)[source]

Receive a result-row column value to be converted.

Subclasses should implement this method to operate on data fetched from the database.

Subclasses override this method to return the value that should be passed back to the application, given a value that is already processed by the underlying TypeEngine object, originally from the DBAPI cursor method fetchone() or similar.

The operation could be anything desired to perform custom behavior, such as transforming or serializing data. This could also be used as a hook for validating logic.

Parameters
  • value – Data to operate upon, of any type expected by this method in the subclass. Can be None.

  • dialect – the Dialect in use.

This operation should be designed to be reversible by the “process_bind_param” method of this class.

class onegov.file.models.file.SearchableFile[source]

Files are not made available for elasticsearch by default. This is for security reasons - files are public by default but one has to know the url (a very long id).

Search might lead to a disclosure of all files, which is why files can only be searched if they are of a different polymorphic subclass and use this mixin.

property es_suggestion

Returns suggest-as-you-type value of the document. The field used for this property should also be indexed, or the suggestion will lead to nowhere.

If a single string is returned, the completion input equals the completion output. (My Title -> My Title)

If an array of strings is returned, all values are possible inputs and the first value is the output. (My Title/Title My -> My Title)

property es_public

Returns True if the model is available to be found by the public. If false, only editors/admins will see this object in the search results.

class onegov.file.models.file.File(**kwargs)[source]

A general file (image, document, pdf, etc), referenced in the database.

Thanks to the use of Depot files can be seemingly stored in the database (with transaction guarantees), without actually storing it in the database.

id

the unique, public id of the file

name

the name of the file, incl. extension (not used for public links)

note

a short note about the file (for captions, other information)

order

the default order of files

published

true if published

publish_date

the date after which this file will be made public - this controls the visibility of the object through the access property which in turn is enforced by onegov.core.security.rules.

To get a file published, be sure to call onegov.file.collection.FileCollection.publish_files() once an hour through a cronjob (see onegov.core.cronjobs)!

signed

true if the file was digitally signed in the onegov cloud

(the file could be signed without this being true, but that would amount to a signature created outside of our platform, which is something we ignore)

signature_metadata

the metadata of the signature - this should include the following data:

- old_digest: The sha-256 hash before the file was signed
- new_digest: The sha-256 hash after the file was signed
- signee: The username of the user that signed the document
- timestamp: The time the document was signed in UTC
- request_id: A unique identifier by the signing service
type

the type of the file, this can be used to create custom polymorphic subclasses. See http://docs.sqlalchemy.org/en/improve_toc/orm/extensions/declarative/inheritance.html.

not to be confused with the the actual filetype which is stored on the reference!

reference

the reference to the actual file, uses depot to point to a file on the local file system or somewhere else (e.g. S3)

checksum

the md5 checksum of the file before it was processed by us, that is if the file was very large and we in turn made it smaller, it’s the checksum of the file before it was changed by us this is useful to check if an uploaded file was already uploaded before

note, this is not meant to be cryptographically secure - this is strictly a check of file duplicates, not protection against tampering

extract

the content of the given file as text, if it can be extracted (it is important that this column be loaded deferred by default, lest we load massive amounts of text on simple queries)

stats

statistics around the extract (number of pages, words, etc.) those are usually set during file upload (as some information is lost afterwards)

property file_id

The file_id of the contained reference.

If virtual_file_id is not None, it is returned instead.

property claimed_extension

Returns the extension as defined by the file name or by the content type (whatever is found first in this order).

Note that this extension could therefore not be correct. It is mainly meant for display purposes.

If you need to know the type of a file you should use the content type stored on the reference.

get_thumbnail_id(size)[source]

Returns the thumbnail id with the given size (e.g. ‘small’).

class onegov.file.models.fileset.FileSet(**kwargs)[source]

A set of files that belong together. Each file may be part of none, one or many sets. Each set may containe none, one or many files.

The fileset uses uuids for public urls instead of a readable url-safe name, because files are meant to be always public with an unguessable url, and so the filesets containing files must also have the same property.

Otherwise we might not be able to guess the name the of the file, but we will be able to guess the name of a fileset containing files.

id

the unique, public id of the fileset

title

the title of the fileset (not usable in url)

type

the type of the fileset, this can be used to create custom polymorphic subclasses. See http://docs.sqlalchemy.org/en/improve_toc/orm/extensions/declarative/inheritance.html.

this is independent from the onegov.file.models.File.type attribute on the File.

Collection

class onegov.file.collection.FileCollection(session, type='*', allow_duplicates=True)[source]

Manages files.

Parameters
  • session – The SQLAlchemy db session to use.

  • type – The polymorphic type to use and to filter for, or ‘*’ for all.

  • allow_duplicates

    Prevents duplicates if set to false. Duplicates are detected before pre-processing, so already stored files may be downloaded and added again, as they might have changed during the upload.

    Note that this does not change existing files. It only prevents new duplicates from being added.

add(filename, content, note=None, published=True, publish_date=None)[source]

Adds a file with the given filename. The content maybe either in bytes or a file object.

replace(file, content)[source]

Replaces the content of the given file with the new content.

publishable_files(horizon=None)[source]

Yields files which may be published.

publish_files(horizon=None)[source]

Publishes unpublished files with a publish date older than the given horizon.

by_id(file_id)[source]

Returns the file with the given id or None.

by_filename(filename)[source]

Returns a query that matches the files with the given filename.

Be aware that there may be multiple files with the same filename!

by_checksum(checksum)[source]

Returns a query that matches the given checksum (may be more than one record).

by_content(content)[source]

Returns a query that matches the given content (may be more than one record).

by_content_type(content_type)[source]

Returns a query that matches the given MIME content type (may be more than one record).

by_signature_digest(digest)[source]

Returns a query that matches the given digest in the signature metadata. In other words, given a digest this function will find signed files that match the digest - either before or after signing.

Unsigned files are ignored.

The digest is expected to be a SHA256 hex.

locate_signature_metadata(digest)[source]

Looks for the given digest in the files table - if that doesn’t work it will go through the audit trail (i.e. the chat messages) and see if the digest can be found there.

If this database was ever used to sign a file with the given digest, or if a file that was signed had the given digest, this function will find it - barring manual database manipulation in the messages log.

class onegov.file.collection.FileSetCollection(session, type='*')[source]

Manages filesets.

by_id(fileset_id)[source]

Returns the fileset with the given id or None.

Integration

class onegov.file.integration.DepotApp[source]

Provides Depot integration for onegov.core.framework.Framework based applications.

configure_files(**cfg)[source]

Configures the file/depot integration. The following configuration options are accepted:

Depot_backend

The depot backend to use. Supported values:

  • depot.io.local.LocalFileStorage

  • depot.io.memory.MemoryFileStorage

Depot_storage_path

The storage path used by the local file storage.

Note that the actual files are stored under a subdirectory specific to each application id. This is mainly to keep a handle on which file belongs to which application. Additionally it ensures that we aren’t accidentally opening another application’s files.

Frontend_cache_buster

A script able to bust the frontend cache.

Our frontend (nginx) caches the files we store in the backend and serves them mostly without bothering us. This can be problematic when the file is deleted or if it is made private. The cache needs to be busted in this case.

With this configuration a script/command can be specified that receives the url that needs to be busted and in turn busts the content of this url from the cache. This pretty much depends on the platform this is run and on the frontend in use.

For example, let’s say our script is called ‘bust-cache’, this is the command that will be run when the cache is busted:

sleep 5
bust-cache id-of-the-file

As you can see, the command is invoked with a five second delay. This avoids premature cache busting (before the end of the transaction). The command is non-blocking, so those 5 seconds are not counted towards the request-time.

Frontend caches might use the domain and the full path to cache a file, but since we can technically have multiple domains/paths point to the same file we simply pass the id and let the cache figure out what urls need to be busted as a result.

The script is invoked with the permissions of the user running the backend. If other permissions are required, use suid.

Note that this script is optional. If omitted, the cache busting turns into a noop.

Signing_services

Contains signing service configs.

Each application gets exactly one signing service.

This integration class will take care of instantiating the signing service and offer it through self.signing_service.

The signing service can be used with any file, though there is first-class support for signing onegov.file models.

Signing services are implemented using sublcasses of the onegov.file.sign.SigningService. Each signature service class is configured using a single yaml file which is stored in the signature config path.

By default we use the ‘__default__.yml’ config. Alternatively we can create separate configs for various application ids.

For example, we might create a onegov_town-govikon.yml, which would take precedence over the default config, if the application with the id onegov_town-govikon would use the signing service.

sign_file(file, signee, token, token_type='yubikey')[source]

Signs the given file and stores metadata about that process.

During signing the stored file is replaced with the signed version.

For example:

pdf = app.sign_file(pdf, 'info@example.org', 'foo')
Parameters
  • file – The onegov.file..File instance to sign.

  • signee – The name of the signee (should be a username).

  • token

    The (yubikey) token used to sign the file.

    WARNING: It is the job of the caller to ensure that the yubikey has the right to sign the document (i.e. that it is the right yubikey).

  • token_type – They type of the passed token. Currently only ‘yubikey’.

temporary_depot(depot_id, **configuration)[source]

Temporarily use another depot.

onegov.file.integration.delete_file(self, request)[source]

Deletes the given file. By default the permission is Private. An application using the framework can override this though.

Since a DELETE can only be sent through AJAX it is protected by the same-origin policy. That means that we don’t need to use any CSRF protection here.

That being said, browser bugs and future changes in the HTML standard make it possible for this to happen one day. Therefore, a time-limited token must be passed as query parameter to this function.

New tokens can be acquired through request.new_csrf_token.

Attachments

class onegov.file.attachments.ProcessedUploadedFile(content, depot_name=None)[source]
process_content(content, filename=None, content_type=None)[source]

Standard implementation of DepotFileInfo.process_content()

This is the standard depot implementation of files upload, it will store the file on the default depot and will provide the standard attributes.

Subclasses will need to call this method to ensure the standard set of attributes is provided.

Filters

class onegov.file.filters.ConditionalFilter(filter)[source]

A depot filter that’s only run if a condition is met. The condition is defined by overriding the :meth:meets_condition returns True.

on_save(uploaded_file)[source]

Filters are required to provide their own implementation

class onegov.file.filters.OnlyIfImage(filter)[source]

A conditional filter that runs the passed filter only if the uploaded file is an image.

class onegov.file.filters.OnlyIfPDF(filter)[source]

A conditional filter that runs the passed filter only if the uploaded file is a pdf.

class onegov.file.filters.WithThumbnailFilter(name, size, format)[source]

Uploads a thumbnail together with the file.

Takes for granted that the file is an image.

The resulting uploaded file will provide an additional property thumbnail_name, which will contain the id and the path to the thumbnail. The name is replaced with the name given to the filter.

Warning

Requires Pillow library

Note: This has been copied from Depot and adjusted for our use. Changes include a different storage format, no storage of the url and replacement of thumbnails instead of recreation (if possible).

on_save(uploaded_file)[source]

Filters are required to provide their own implementation

class onegov.file.filters.WithPDFThumbnailFilter(name, size, format)[source]

Uploads a preview thumbnail as PNG together with the file.

This is basically the PDF implementation for WithThumbnailFilter.

Warning

Requires the presence of ghostscript (gs binary) on the PATH.