Web application file-reference methods

Faculty of Engineering and TechnologySchool of Computer ScienceDatabase Department


Type classification: this is an article resource.
Subject classification: this is a technology resource.

This research project examines the different methods employed by web applications to refer, from within the database, to non-application files in the filesystem. We aim to determine what attributes (of database schema, filesystem, file, user, etc.) are used to determine the location of the files in the filesystem.

Introduction

edit

Most database-backed web applications store non-application files, such as those uploaded by users, in the filesystem rather than as objects in the database.[1] These files relate to tables and rows in the databases in various ways, and there are many different ways of recording these relationships.

The 'non-application' files are those that are considered part of the application's data, rather than 'resource' files such as icons and other files used as part of, and distributed with, the application. Generally, these are files that are uploaded by users, and there will often be ancillary versions of them created by the application (such as, in the case of images for example, lower-resolution versions for quicker web display).

Here we investigate the methods used by a number of web applications to store file reference information in their databases. We

The web applications examined here all: use a database; and store attached files in the filesystem.

For each application, we aim to find out what attributes are stored about files, and how they are used when accessing files. Ultimately, we are trying to determine what information is required, given an arbitrary database schema, to determine whether files exist that relate to any given row, and where and how to access such files.

Applications

edit

WordPress

edit
 
WordPress ERD

WordPress is a MySQL-backed blogging platform in which uploaded files are saved to a user-configurable uploads directory. Files in this directory can be organised into a number of different subdirectory structures (by default, named by year and month) under wp-content/uploads/ in the installation directory. Files are accessed directly by the web server (and not read/written via the application). Each uploaded file has a single row in the posts table (which may or may not be linked to an actual post row via the post_parent key). The post_type column of this row has a value of 'attachment'. Rows in the postmeta table are linked to this row. Where the meta_key column of postmeta is '_wp_attached_file', the meta_value column is a filename, relative to the WP upload directory.

Summary of WordPress' method of filesystem reference.
Table: postmeta
Column: meta_value (only for those rows where meta_key equals '_wp_attached_file')
Directory: Under wp-content/uploads
Format: Above directory concatenated with the value of meta_value
Cardinality: One file per row

Drupal

edit

Drupal is a general-purpose Content Management System, again using MySQL for its database. For the purposes of this reseach, Drupal 7.4 was installed, using the 'minimal-7.4' installation profile. File information is stored in the file_managed table, with a custom-URI in the uri column. Files can be stored in either the public or private hierarchies, depending on the prefix of the uri value: public:// or private://.

Summary of Drupal's method of filesystem reference.
Table: file_managed
Column: uri
Directory: Under sites/{site_name}/{public or private directory} (these values must be retrieved from elsewhere in the DB)
Format: Above directory concatenated with the value of the uri column after the prefixed custom quasi-protocol has been stripped; i.e. the captured section in the regular expression (public|private)://(.*)
Cardinality: One file per row

Bespoke Application 'A'

edit

This is a bespoke database application built to manage personnel training records for a medium-sized organisation. The source code is not freely available, but the method used is explained here and intended to stand alone, as it could be taken as a hyphothetical example of how this type of file reference could be done and so there is no need to be able to examine the original source.

Table: completed_qualifications
Column: id
Directory: Under data/dbname/completed_qualifications/
Format: Above directory concatenated with the value of the id column
Cardinality: Multiple files per row; the above format results in a directory, all files in which are related to the row in question

MediaWiki

edit

MediaWiki keeps its files in directories whose names are constructed from the first two characters of the MD5 hash of the file's name.

Table: image
Column: img_name
Directory: Under images/
Format: Above directory with two levels of subdirectories named from substrings of the MD5 hash of the img_name column: the first character, and then the first two characters, of the hash; within this lower directory resides the file, with the name given in image.img_name
Cardinality: One file per row

Conclusion

edit

Given an arbitrary database schema and the absolute path to the highest directory in which files are stored, it is possible to construct an SQL statement that will provide the latter part of the path to the file or files.

References

edit
  1. http://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay