File references

File references are strings of bytes, that can be encountered in the file_reference fields of document and photo objects.

They must be cached by the client, along with the origin where the document/photo object was found, in order to be refetched when the file reference expires.

Example implementation of a reference database: MadelineProto, android, telegram desktop, tdlib.

Automatic generation

Implementation and maintenance of the file reference database may be fully automated by using the following file reference origin definition file.

Latest file reference origin definition file for the current layer »

First, some definitions:

  • A file reference path is a deserialization path where a file_reference field may appear, for example updateNewMessage.message -> message.media -> messageMediaDocument.document -> document.file_reference.
  • A file reference origin contains information that the client may use to re-fetch the document (and the new file reference), for example for the above path it's getMessage{peer: updateNewMessage.message.peer, id: updateNewMessage.message.id}, which can be used to refetch the document using either messages.getMessages or channels.getMessages depending on the type of the Peer.

The definition file contains all possible origins, for all possible file reference paths.

It is automatically generated and validated by the file reference origin generator using a set of manually-specified rules (specifying the actual origins) and the latest API schema, to make sure the following rules are followed:

  • All possible and valid places where a file_reference appears have at least one valid associated origin: this is checked by recursively checking all possible deserialized object graphs, starting from:

    • All constructors of type Update
    • All constructors which are directly returned by at least one method.
    • All constructors which are directly returned by at least one method inside of a vector.

    This covers all possible TL payloads received from the API, which can only return method call responses, or Update constructors contained inside Updates.

    For example, when checking the graph of the updateNewMessage constructor, the following file reference paths are found:

    updateNewMessage.message -> message.media -> messageMediaDocument.document -> document.file_reference
    updateNewMessage.message -> message.reply_to -> messageReplyHeader.reply_media -> messageMediaDocument.document -> document.file_reference
    updateNewMessage.message -> message.media -> messageMediaInvoice.extended_media -> messageExtendedMedia.media -> messageMediaDocument.document -> document.file_reference
    ... and many others

    ... and for all paths, the system which generates the definition file makes sure that at least one origin covers the path.

  • All origins must be used in at least one path.

  • All paths covered by origins which make use of flag fields (which may be absent, leading to an orphan, context-less file reference) must be covered by at least one non-flagged origin (or the flagged origin must be non-flagged in the specified path).

    For example, the file reference path updateStory.story -> storyItem.media -> messageMediaDocument.document -> document.file_reference would rely on the origin getStory{peer: storyItem.from_id, story_id: storyItem.id} with stories.getStoriesByID.

    However, the from_id field of storyItem is actually a flag and in this specific path it is not set (it's only set when a storyItem is returned by stories.getAllStories).

    The validator (the code that generated the definition file, you DO NOT have to implement a validator yourself) noticed that, which forced the manual addition of the valid fallback origin getStory{peer: updateStory.peer, story_id: updateStory.story -> storyItem.id}, which is present in the final origin definition file.

    Note that the definition file is already pre-validated, no additional validation is needed to implement it, the above is just an example of a case that is successfully covered by the validator.

Implementation

Implementation of a file reference database based on the origin definition file can be done as follows:

  • Within the TL parser (or within the codegen that generates the TL parser deserialization code), add support for a list of stack

The definition file uses the following TL schema:

// Root
fileReferenceOrigins ctxs:Vector<Origin> = FileReferenceOrigins;

origin flags:# predicate:string is_constructor:flags.0?true action:flags.1?ActionOp noop:flags.2?string needs_parent:flags.3?string parent_is_constructor:flags.4?true = origin;

// For string => TypedOp dictionaries
typedOpArg key:string value:TypedOp = TypedOpArg;

// Actions
callOp method:string args:Vector<TypedOpArg> = ActionOp;
getMessageOp flags:# peer:TypedOp id:TypedOp from_scheduled:flags.0?TypedOp = ActionOp;

// Field extraction path
paramNotFlag = ParamFlag;
paramIsFlagAbortIfEmpty = ParamFlag;
paramIsFlagFallback fallback:TypedOp = ParamFlag;
paramIsFlagPassthrough = ParamFlag;

pathPart flags:# constructor:string param:string flag:ParamFlag = PathPart;

path parts:Vector<PathPart> = Path;
pathParent parts:Vector<PathPart> = Path;

// Typed constructors, the type is specified to simplify codegen,
// but isn't strictly necessary as it can be inferred from the TypedOpOp.
// It is fully pre-validated during the generation of the definition file.
typedOp type:string op:TypedOpOp = TypedOp;

copyOp path:Path = TypedOpOp;
copyFromParentOp path:Path = TypedOpOp;

getInputChannelByIdOp path:Path = TypedOpOp;
getInputUserByIdOp path:Path = TypedOpOp;

getInputPeerOp path:Path = TypedOpOp;
getInputUserOp path:Path = TypedOpOp;
getInputChannelOp path:Path = TypedOpOp;

getStickerSetFromDocumentAttributesOp path:Path = TypedOpOp;

// Literals & constructors (methods not allowed or needed here)
constructorOp constructor:string args:Vector<TypedOpArg> = TypedOpOp;

vectorOp values:Vector<TypedOp> = TypedOpOp;

intLiteralOp value:int = TypedOpOp;
longLiteralOp value:long = TypedOpOp;
stringLiteralOp value:string = TypedOpOp;
boolLiteralOp value:Bool = TypedOpOp;
doubleLiteralOp value:double = TypedOpOp;
themeFormatLiteralOp = TypedOpOp;

Here's a detailed description of the constructors.

Note: The definition file assumes that all Updates constructors have already been converted to a vector of Update constructors, including short variants like updateShortMessage, updateShortSentMessage, updateShortChatMessage, which must be pre-converted to Update constructors by the client using information extracted from the method call.
While this operation could be done within the file reference origin definition file, it would needlessly increase the number of paths: given that most clients already convert short constructors to Update constructors, the file reference origin definition file only considers paths starting from the Update constructors.

Root

fileReferenceOrigins ctxs:Vector<Origin> = FileReferenceOrigins;

origin flags:# predicate:string is_constructor:flags.0?true action:flags.1?ActionOp noop:flags.2?string needs_parent:flags.3?string parent_is_constructor:flags.4?true = origin;

The definition file is composed of a single fileReferenceOrigins constructor, which contains a list of origin constructors.

Each origin represents a constructor or method file reference origin:

  • predicate - Indicates the name of the constructor or of the method where extraction of the origin fields must start.
  • is_constructor - If set, predicate points to a constructor, otherwise it points to a method.
  • needs_parent - If set, contains the name of a constructor which needs to appear as a parent in the deserialized object or a method whose response we're deserializing, as it will be used by one or more of the paths in the action.
  • parent_is_constructor - If set, needs_parent points to a constructor; otherwise it points to a method.
  • action - Optional: contains the method that needs to be invoked to refresh the reference, when needed.
  • noop - Optional: contains a human-readable description as to why should this origin be ignored.

Exactly one of the mutually exclusive action, noop flags must be set.

If the noop flag is set, this origin should be ignored completely (including during codegen): noop origins are used internally to make sure all file reference paths are still covered in some way during validation, including paths for ephemeral media like inline results, or for media without any associated origin (for example, media uploaded using messages.uploadMedia but not yet sent anywhere obviously does not have any associated origin).

If the action flag is set and the is_constructor flag is set, when deserializing a constructor with predicate equal to predicate, contained in an incoming Update or in at any depth in a method call response, do the following:

  • Before beginning deserialization of the constructor, push a new origin to the origin stack.
    Skip pushing the origin if needs_parent is set but we don't have a constructor or method of the appropriate type in our parents.
  • During deserialization, add all encountered file_references to all origins of all types currently on the stack.
  • After deserialization of the constructor, pop the pushed origin, make sure ActionOp can be evaluated (all required flag fields are set, all paths can be correctly extracted; the check can be done during evaluation, aborting the evaluation on error), and if yes, evaluate and commit the origin to the database.
    Skip this step if needs_parent is set but we don't have a constructor or method of the appropriate type in our parents.

If the action flag is set and the is_constructor flag is not set, when deserializing the response of the method with name equal to predicate, do the following:

  • Before beginning deserialization of the method response, push a new origin of type originName to the origin stack.
  • During deserialization, add all encountered file_references to all origins of all types currently on the stack.
  • After deserialization of the method response, pop the pushed origin, make sure ActionOp can be evaluated (all required flag fields are set, all paths can be correctly extracted; the check can be done during evaluation, aborting the evaluation on error), and if yes, evaluate and commit the origin to the database.

Note that method origins cannot make use of needs_parent.

Actions

// Actions
callOp method:string args:Vector<TypedOpArg> = ActionOp;
getMessageOp flags:# peer:TypedOp id:TypedOp from_scheduled:flags.0?TypedOp = ActionOp;

// For string => TypedOp dictionaries
typedOpArg key:string value:TypedOp = TypedOpArg;

// Action parameters

// Typed constructors, the type is specified to simplify codegen,
// but isn't strictly necessary as it can be inferred from the TypedOpOp.
// It is fully pre-validated during the generation of the definition file.
typedOp type:string op:TypedOpOp = TypedOp;

Actions are stored (associated to one or more file references) after the deserialization of a method or constructor (as specified above »), and executed when one of the file references expire (i.e. a FILE_REFERENCE_EXPIRED RPC error is returned when using it).

The arguments are composed of a set of typedOp constructors.

typedOp » is a wrapper for a TypedOpOp constructor which also contains the TL type of the associated TypedOpOp; this isn't strictly necessary for evaluation, but it can be useful during automatic code generation from the definition file.

callOp

callOp is a generic action which invokes the method specified in method with the arguments specified in args.

callOp.args will always contain at least all of the required parameters, and possibly some flagged parameters as well.

getMessageOp

getMessageOp is a specialized action which invokes either messages.getMessages or channels.getMessages depending on the type of the peer, passing the id as the only element to the vector id parameter.
If the from_scheduled flag is present and set, and the flag to which the path points is also set, messages.getScheduledMessages should be executed instead.

Action parameters

copyOp path:Path = TypedOpOp;
copyFromParentOp path:Path = TypedOpOp;

getInputChannelByIdOp path:Path = TypedOpOp;
getInputUserByIdOp path:Path = TypedOpOp;

getInputPeerOp path:Path = TypedOpOp;
getInputUserOp path:Path = TypedOpOp;
getInputChannelOp path:Path = TypedOpOp;

getStickerSetFromDocumentAttributesOp path:Path = TypedOpOp;

// Literals & constructors (methods not allowed or needed here)
constructorOp constructor:string args:Vector<TypedOpArg> = TypedOpOp;

vectorOp values:Vector<TypedOp> = TypedOpOp;

intLiteralOp value:int = TypedOpOp;
longLiteralOp value:long = TypedOpOp;
stringLiteralOp value:string = TypedOpOp;
boolLiteralOp value:Bool = TypedOpOp;
doubleLiteralOp value:double = TypedOpOp;
themeFormatLiteralOp = TypedOpOp;

Action parameters are represented by TypedOpOp constructors.

copyOp

The most commonly used type, copies the value(s) at the specified path ».

copyFromParentOp

Copies the value(s) at the specified path », starting from the constructor/method specified in origin.needs_parent.

getInputChannelByIdOp

Returns an InputChannel constructor from the client's peer database, based on the channel ID of type long specified in path.

getInputUserByIdOp

Returns an InputUser constructor from the client's peer database, based on the channel ID of type long specified in path.

getInputPeerOp

Transforms and returns the Peer constructor to which path points into an InputPeer constructor.

getInputUserOp

Transforms and returns the User constructor to which path points into an InputUser constructor.

getInputChannelOp

Transforms and returns the Channel constructor to which path points into an InputChannel constructor.

getStickerSetFromDocumentAttributesOp

Takes the Vector<DocumentAttribute> to which path points, looks for a documentAttributeSticker, and returns the InputStickerSet contained in documentAttributeSticker.stickerset; aborts if there is no attribute of type documentAttributeSticker in the passed vector.

constructorOp

Constructs the constructor of type (predicate) constructor using the arguments specified in args.

vectorOp

Constructs a vector of the constructors passed in values.

intLiteralOp

Constructs a literal int with the value passed in value.

longLiteralOp

Constructs a literal long with the value passed in value.

stringLiteralOp

Constructs a literal string with the value passed in value.

boolLiteralOp

Constructs a literal Bool with the value passed in value.

doubleLiteralOp

Constructs a literal double with the value passed in value.

themeFormatLiteralOp

Constructs a string, indicating the theming engines supported by the client (used when working with theme-related media, can be an empty string if the client doesn't support themes).

Paths

paramNotFlag = ParamFlag;
paramIsFlagAbortIfEmpty = ParamFlag;
paramIsFlagFallback fallback:TypedOp = ParamFlag;
paramIsFlagPassthrough = ParamFlag;

pathPart flags:# constructor:string param:string flag:ParamFlag = PathPart;

path parts:Vector<PathPart> = Path;
pathParent parts:Vector<PathPart> = Path;

Paths are used by action parameters to extract a parameter from one or more constructors, i.e. updateStory.story -> storyItem.media -> messageMediaDocument.document -> document.file_reference.

The first part of the path always points to:

  • The constructor/method of the origin (origin.predicate), for path
  • The constructor/method of the parent contained in origin.needs_parent, for pathParent

A path is composed of multiple pathParts.

Each pathPart contains the following fields, which describe how to extract the field.

  • constructor - Indicates the required constructor/method predicate. If a different constructor type is encountered (i.e. documentEmpty instead of document), abort extraction.
    By definition, if a method name is passed, it will always be equal to the method of the associated origin.
  • param - Indicates the required parameter; if it's an empty string, it indicates the return value of a method.
  • flag - Contains exactly one of the following constructors:
    • paramNotFlag - The current parameter is not a flag
    • paramIsFlagAbortIfEmpty - The current parameter is a flag, and if it's not set, abort extraction.
    • paramIsFlagFallback - The current parameter is a flag, and if it's not set, use the specified TypedOp as fallback value.
    • paramIsFlagPassthrough - The current parameter is a flag, and its value should be copied/returned verbatim; can only be used on the last element of a path and within the arguments of a constructorOp/callOp/getMessageOp only if the argument that uses this path is a flag of the same type.

Another example:

Assume you receive a message from your friend: that message contains a messageMediaPhoto with a photo.

Your client has to cache not only the file_reference field of the photo, but also the context in which the file reference was seen (in this case, a message coming from a specific user).

The context info is in this case, an origin of type message, containing the message ID and the peer ID of the chat/channel/user where the message was seen.

The context info has to be associated with the file reference: when downloading a file using upload.getFile or resending it using messages.sendMedia, a FILE_REFERENCE_EXPIRED error may be returned.
messages.sendMultiMedia returns a variation of the same error, as a FILE_REFERENCE_%d_EXPIRED error (where %d is the index of the media with the expired file reference in the passed media array).
If this happens, the context info must be used to refetch the object that contained the file reference: in this example, the peer info and the message ID have to be used with channels.getMessages or messages.getMessages to refetch the message, recache the file reference and use it in a new file download request.

More than one origin can be associated to one file reference, for greater resilience (in the case of a message that was deleted in one chat but was also forwarded in another chat, the file reference can be refetched from the second chat, instead).

Origins for objects returned by method calls with certain parameters can be considered, too (for example, in the case of favorited sticker sets returned by messages.getFavedStickers).