What is MP4 Format Video?
The 5 generation iPod (iPod Video) can play videos on it, and
the video should be Mp4 format.
Here is the definition of MP4 video format: (from
Within the ISO/IEC 14496 MPEG-4 standard there are several parts
that define file formats for the storage of time-based media
(such as audio, video etc.). They are all based and derived from
the ISO Base Media File Format , which is a structural,
media-independent definition that is also published as part of
the JPEG 2000 family of standards.
The MP4 file format  defines the storage of MPEG-4 audio,
scenes and multimedia content using the ISO Base Media File
Format. The AVC File Format  defines the storage for the
Advanced Video Coding (ISO/IEC 14496-10/AVC) standard  data
within files of the ISO Base Media File Format family.
There are also related file formats that use the structural
definition of a box-structured file as defined in the ISO Base
Media File Format, but do not use the definitions for time-based
media. The MPEG-21 File Format  is one such standard and
defines the storage of an MPEG-21 digital item, with some or all
of its ancillary data (such as images, movies, or other non-XML
data) within the same file. Other files using this structure
include the standard file formats for JPEG 2000 images, such as
A diagrammatic overview of the relationship between the various
file formats with the ISO Base Media File Format is shown in
Figure 1: Relationship between the ISO, MP4, AVC,
MPEG-21 File Formats
The family of
the storage file formats is based in the concept of
box-structured files. A box-structured file consists of a series
of boxes (sometimes called atoms), which have a size and a type.
The type field is usually four printable characters. Box
structured files are used in a number of applications, and it is
possible to form multi-purpose’ files which contain the boxes
required by more than one specification. Examples include not
only the ISO Base File Format family described here, but also
the JPEG 2000 file format family, which for the most part is a
still-image file format.
The ISO Base Media File
Format additionally contains structural and media data
information for timed presentations of media data such as audio,
video, etc. This structure is intentionally general, so that by
structuring files in different ways the same base specification
can be used for files for
exchange and download, including incremental download and
editing, composition, and lay-up;
streaming from streaming servers.
specialized uses include the use for the storage of a partial or
complete MPEG-4 scene and associated object descriptions. This
general structure has been adopted not only for the MP4 file
format, but a number of other standards bodies, trade
associations, and companies .
Box Structured Files
The file structure
is object-oriented; that is, a file can be decomposed into its
constituent objects very simply, and the structure of the
objects can be inferred directly from their type and position.
The types are 32-bit values and usually chosen to be four
printable characters, for ease of inspection and editing. There
is provision for using extension boxes with a Universal Unique
Identifier type (UUID) , and specification text is provided
on how to convert all box types into UUID’s.
All box-structured files start with a file-type box (possibly
after a box-structured signature) that defines the best use of
the file, and the specifications to which the file complies.
These are documented as ‘brands’. Brands identify a
specification. The presence of a brand in this box indicates
both a claim and a permission; a claim by the file writer that
the file complies with the specification, and a permission for a
reader, possibly implementing only that specification, to read
and interpret the file.
ISO Base Media File Format
The ISO Base Media File Format is designed to contain timed
media information for a presentation in a flexible, extensible
format that facilitates interchange, management, editing, and
presentation of the media. This presentation may be ‘local’ to
the system containing the presentation, or may be via a network
or other stream delivery mechanism.
The files have a logical structure, a time structure, and a
physical structure, and these structures are not required to be
coupled. The logical structure of the file is of a movie that in
turn contains a set of time-parallel tracks. The time structure
of the file is that the tracks contain sequences of samples in
time, and those sequences are mapped into the timeline of the
overall movie by optional edit lists.
The physical structure of the file separates the data needed for
logical, time, and structural de-composition, from the media
data samples themselves. This structural information is
concentrated in a movie box, possibly extended in time by movie
fragment boxes. The movie box documents the logical and timing
relationships of the samples, and also contains pointers to
where they are located. Those pointers may be into the same file
or another one, referenced by a URL.
Each media stream is contained in a track specialized for that
media type (audio, video etc.), and is further parameterized by
a sample entry. The sample entry contains the ‘name’ of the
exact media type (i.e., the type of the decoder needed to decode
the stream) and any parameterization of that decoder needed. The
name also takes the form of a four-character code. There are
defined sample entry formats not only for MPEG-4 media, but also
for the media types used by other organizations using this file
format family. They are registered at the MP4 registration
Protected streams are also supported by the file format (e.g.
streams encrypted for use in a digital rights management systems
(DRM)). There is a general structure for protected streams,
which documents the underlying format, and also documents the
protection system applied and any parameters it needs.
Support for meta-data takes two forms. First, timed meta-data
may be stored in an appropriate track, synchronized as desired
with the media data it is describing. Secondly, there is general
support for non-timed meta-data attached to the movie or to an
individual track. The structural support is general, and allows,
as in the media-data, the storage of meta-data resources
elsewhere in the file or in another file. In addition, these
resources may be named, and may be protected.
These generalized meta-data structures may also be used at the
file level, above or parallel with or in the absence of the
movie box. In this case, the meta-data box is the primary entry
into the presentation. This structure is used for MPEG-21 files
and other bodies are using it to wrap together other integration
specifications (e.g. SMIL ) with the media integrated.
Sometimes the samples within a track have different
characteristics or need to be specially identified. One of the
most common and important characteristic is the synchronization
point (often a video I-frame). These points are identified by a
special table in each track. More generally, the nature of
dependencies between track samples can also be documented.
Finally, there is a concept of named, parameterized sample
groups. These permit the documentation of arbitrary
characteristics that are shared by some of the samples in a
track. In the AVC file format, sample groups are used to support
the concept of layering and sub-sequences.
MP4 files are generally used to
contain MPEG-4 media, including not only MPEG-4 audio and/or
video, but also MPEG-4 presentations. When a complete or partial
presentation is stored in an MP4 file, there are specific
structures that document that presentation.
MPEG-4 presentations are scenes, described by the scene language
MPEG-4 BIFS. Within those scenes media objects can be placed;
these media objects might be audio, video, or entire sub-scenes.
Each object is described by an object descriptor, and within the
object descriptor the streams that make up that object are
described. The entire scene is described by an initial object
descriptor (IOD). This is stored in a special box within the
movie atom in MP4 files. The scene and the object descriptors it
uses are stored in tracks — a scene track, and an object
descriptor track; for files that comprise a full MPEG-4
presentation this IOD and these two tracks are required.
Each stream is described by an elementary stream descriptor.
When a complete scene is delivered, these are delivered as part
of the object descriptor stream. However, for ease of
composition, and to manage files that contain only media
streams, these elementary stream descriptors are stored with the
media streams themselves — in the descriptive track structures —
in MP4 files.
MPEG-21 File Format
As described above, the
general meta-box can be used at the file level to contain a
description and its associated or included data. This structure
is used for MPEG-21 files. A file-level meta-box is used to hold
an MPEG-21 Digital Item Declaration (DID) , The meta-box
also contains a list of attached resources; which may have local
names, and may be located within the same file or in another
is delivered over a streaming protocol it often must be
transformed from the way it is represented in the file. The most
obvious example of this is the way media is transmitted over the
Real Time Protocol (RTP) . In the file, for example, each
frame of video is stored contiguously as a file-format sample.
In RTP, packetization rules specific to the codec used, must be
obeyed to place these frames in RTP packets.
A streaming server may calculate such packetization at run-time
if it wishes. However, there is support for the assistance of
the streaming servers. Special tracks called hint tracks may be
placed in the files. Hint tracks contain general instructions
for streaming servers as to how to form packet streams, from
media tracks, for a specific protocol. Because the form of these
instructions is media-independent, servers do not have to be
revised when new codecs are introduced. In addition, the
encoding and editing software can be unaware of streaming
servers. Once editing is finished on a file, then a piece of
software called a hinter may be used that adds hint tracks to
the file, before placing it on a streaming server. There is a
defined hint track format for RTP streams in the MP4 file format
The following table contains a summary of some of the common
file types in the ISO Base Media File Format Family. The formal
registration authorities (e.g. the MP4 registration authority
 for brands, or the Internet Assigned Numbers Authority 
for MIME types) and the appropriate specifications should be
consulted for definitive information.
||video/mp4, audio/mp4, application/mp4
||various, e.g. 3gp4, 3gp5
|Motion JPEG 2000
There is a registration authority which registers and documents
the four-character-code code-points used in this file-format
family, as well as some other code-points related to MPEG-4
systems. The database is publicly viewable and registration is
 ISO/IEC 14496-12, ISO Base Media File Format; technically
identical to ISO/IEC 15444-12
 ISO/IEC 14496-14, MP4 File Format
 ISO/IEC 14496-15, Advanced Video Coding (AVC) file format
 ISO/IEC 14496-10, Advanced Video Coding
 ISO/IEC 21000-9, MPEG-21 File Format
 ISO/IEC 15444-1, JPEG 2000 Image Coding System
 The MP4 Registration Authority, http://www.mp4ra.org/
 ISO/IEC 9834-8:2004 Information Technology, "Procedures for
the operation of OSI Registration of Universally Unique
Identifiers (UUIDs) and their use as ASN.1 Object Identifier
components" ITU-T Rec. X.667, 2004
 SMIL: Synchronized Multimedia Integration Language;
World-Wide Web Consortium (W3C) http://www.w3.org/TR/SMIL2/
 ISO/IEC 21000-2 Digital Item Declaration
 RTP: A Transport Protocol for Real-Time Applications; IETF
RFC 3550, http://www.ietf.org/rfc/rfc3550.txt
 The Internet Assigned Numbers Authority