Technical guide to AFP NewsML-G2 documents


Table of Contents

Introduction

AFP delivers information in a number of ways, tailored to its clients needs. One delivery vector is NewsML-G2, an industry-driven format and processing model allowing rich machine-readable representation of news content.

This document is your technical guide to AFP NewsML-G2 documents. You'll make use of it when implementing systems that receive and process AFP NewsML-G2 documents. It describes how building blocks defined by NewsML-G2 are combined in AFP documents to convey news content and associated metadata (titles, genres, subjects, embargo, etc.). It should be read and used along the NewsML-G2 documentation provided by IPTC [G2Spec], which it assumes knowledge of.

AFP NewsML-G2 documents build upon the NewsML-G2 format and processing model defined by IPTC (International Press Telecommunications Council) in the context of the NAR (News Architecture). NewsML-G2 is itself an application of XML and makes use of XML Schema. AFP NewsML-G2 documents also make use of XHTML to represent textual content along with rich structural information as chunks of XHTML can be embedded right into NewsML-G2 content. In order to deal with AFP NewsML-G2 documents, you will make use of all these technologies. Figure 1 shows this technology stack and provides links to relevant documentation (click on the text labels).

Technology stack and learning material

The following section provides an overview of AFP documents structure. Further sections describe the various data elements a document conveys, and how to use them.

Overview

There are five types of AFP NewsML-G2 documents:

All AFP documents have the same top-level structure: a NewsML-G2 element called "news message". A news message is an envelope that contain one or more "news item(s)". Each news item represents some news content which can be either a news story in textual form, a photo, a video, a still graphic or an animated graphic.

In documents of type text, picture, video, still graphic and animated graphic, there is only one news item. Multimedia documents on the other hand may contain multiple news items: a main item with the main textual content and additional items for photos, videos, etc.

Section "Type of document" describes how to determine the type of a document. The following sections provide an overview of the structure of documents.

Text documents

Text documents have only one news item. This item contains metadata and textual news content. The content is represented by some XHTML embedded right into the news item.

Top-level structure of AFP text documents

Picture and still graphic documents

Picture and still graphic documents have only one news item that conveys only one logical visual content (e.g., one photo). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the picture or still graphic, the news item contains links to the actual visual content (e.g., JPEG resources) for each rendition. The visual content for each rendition isn't provided in the NewsML-G2 document itself, but by external resources (e.g., accompanying files, Web resources, etc.).

Top level structure of AFP picture and still graphic documents (example)

Video and animated graphic documents

Video and animated graphic documents have only one news item that conveys only one logical visual content (e.g., one video, one animated graphic). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the video or animated graphic, the news item contains links to the actual visual content (e.g., MPEG resources) for each rendition. The visual content for each rendition isn't provided in the NewsML-G2 document itself, but by external resources (e.g., accompanying files, Web resources, etc.).

The the news item may also contains links to renditions of an icon (aka "illustration" or "preview image"). The renditions of the icon aren't provided in the NewsML-G2 document itself, but in external resources (e.g., accompanying files, Web resources, etc.).

Top-level structure of AFP video and animated graphic documents (example)

Multimedia documents

Multimedia documents have one or multiples news items. One of these items is the "main news item". It is always present and provides the main textual content as well as metadata about the document, much like the news item of a text document. It also contains links to other items of the document. These additional items convey information about visual content: pictures, videos or graphics. They are much like the items found in picture, video or graphic documents.

The main news item is identified by the presence of a specific element in its item metadata section: a link element whose rel attribute, a QCode, resolves to http://cv.iptc.org/newscodes/conceptrelation/isA and whose href attribute, an URI, is equal to http://cv.afp.com/itemnatures/mmdMainComp.

<link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
      href="http://cv.afp.com/itemnatures/mmdMainComp"/>

You'll find more information about QCodes in section "Controlled vocabularies and qualified codes".

The figure below provides an example of multimedia document with one main item, a picture item and a video item.

Top-level structure of AFP multimedia documents (example)

Document walkthrough

Below is an example of a simple text document, with just a few metadata and some textual content. Using this example, we will walk through some structural elements that are common to every type of AFP NewsML-G2 documents.

Note that while the XML in this example is formatted to ease reading, actual document you will receive will usually be in a compact form (e.g., all XML on one line).

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" 
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
             xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/ 
                                 NewsML-G2_2.10-spec-All-Power.xsd">
   <header>
      <sent>2009-02-23T20:44:07+02:00</sent>
   </header>
   <itemSet>
      <newsItem standard="NewsML-G2" standardversion="2.10" conformance="power" 
                guid="urn:newsml:afp.com:20111219:6df15bca-ae12-4380-bd7c-e98b3e426457"
                version="3">
         <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_24.xml"/>
         <itemMeta>
            <itemClass qcode="ninat:text"/>
            <provider qcode="nprov:AFP">
               <name>AFP </name>
            </provider>
            <versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
            <pubStatus qcode="stat:usable"/>
         </itemMeta>
         <contentMeta>
            <headline>
               YSL-Bergé collection sets new world record at auction 
               for a private collection
            </headline>
            <subject qcode="medtop:20000031" type="cpnat:abstract">
               <name>visual art</name>
            </subject>
            <subject qcode="medtop:20000011" type="cpnat:abstract">
               <name>fashion</name>
            </subject>
         </contentMeta>    
         <contentSet>
            <inlineXML contenttype="application/xhtml+xml" wordcount="70">
               <html xmlns="http://www.w3.org/1999/xhtml">
                  <head>
                     <title>
                        YSL-Bergé collection sets new world record at auction 
                        for a private collection
                     </title>
                  </head>
                  <body>
                     <p>The Yves Saint Laurent and Pierre Bergé collection sets 
                     new world record at auction for a private collection. 
                     Hundreds of art treasures amassed by late fashion designer
                     Yves Saint Laurent and his companion Pierre Berge over half
                     a century are being auctioned.</p>
                     <p>Bids hit 206 million euros (261 million dollars) on February
                     23, 2009 making it the biggest private collection ever 
                     auctioned with two days of sales still left to run.</p>
                  </body>
               </html>
            </inlineXML>
         </contentSet>
      <newsItem>
   </itemSet>
</newsMessage>

Some notes about this structure:

Controlled vocabularies and qualified codes

Documents make use of a number of controlled vocabularies (aka taxonomies) to convey information. In this section, we focus on a specific set of controlled vocabularies called "NewsML-G2 schemes".

A NewsML-G2 scheme associates unambiguous identifiers to "concepts". These identifiers take the form of URIs (Uniform Resources Identifiers [RFC3986]).

For example, in NewsML-G2 a document is usable, withheld or canceled; this is known as the "publishing status" :

These identifiers are called "concepts URIs". Together, they form a controlled vocabulary. While they may look like dereferencable HTTP URLs, they do not need to be. Their main purpose is to unambiguously identify various concepts.

A document can contain a pubStatus element that conveys the concept URI identifying its publishing status. Therefore, when you receive a document, you can process this concept URI (e.g., compare it to the three possible values given above) to determine what is the publishing status of the document.

However, in NewsML-G2 documents, some concept URIs are not directly expressed using the URI syntax. Instead, they are conveyed as "QCodes" (short for "Qualified Codes"). In some ways, a QCode can be seen as a compressed form of concept URI (actually it is a bit more than that, as it also identify the controlled vocabulary the concept URI is part of, but this is an advanced topic that we won't develop further in this documentation). Determining the concept URI a QCode stands for is called resolving the QCode.

Unlike concept URI, QCodes aren't unambiguous identifiers. Consequently, they can't be used directly and must be resolved to their corresponding concept URIs first. For example, in a given document the status "usable" may be expressed by the following QCode: stat:usable (see it in situ in section "Document walkthrough"). However, in another document the same status might be expressed by the QCode pst:usable. These two QCodes are different but resolve to the same concept URI: http://cv.iptc.org/newscodes/pubstatusg2/usable.

A QCode is made of two parts separated by a colon. The leftmost part (before the leftmost colon) is called the scheme alias. The part on the right of the leftmost colon is called the code.

QCode structure

The resolution process is described precisely in the NewsML-G2 documentation ([G2Impg][G2Spec]). In short, it consists in resolving the scheme alias to a scheme URI using the catalog information provided in the document at the item level, and then to concatenate that scheme URI to the code. In our example, the QCode stat:usable is resolved to http://cv.iptc.org/newscodes/pubstatusg2/usable, because the catalog information of the enclosing news item contains the following element:

<scheme alias="stat" uri="http://cv.iptc.org/newscodes/pubstatusg2/"/>

This catalog information can appear inline in the item inside catalog elements, or in an external resource referenced by the item through a catalogRef element, as in the following example borrowed from the section Document walkthrough:

<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_24.xml"/>

Resolving a QCode raises a concept URI that unambiguously identifies a given concept on a global scale. In our example, the concept identified by http://cv.iptc.org/newscodes/pubstatusg2/usable is: the publishing status "usable". In the context of NewsML-G2 schemes, two logically different concepts are never given the same concept URI, even in different systems managed by different organizations.

How to read the examples

The following sections of this document are dedicated to answer questions of the form "Where is data X in an AFP NewsML-G2 document (and how can I make use of it)?". For example: "Where is the title of the document?", "Where is the textual content?", "Where is the caption?", "Where is the visual content?" etc.

For each data, XML examples are provided. These examples aren't complete documents, though: they are high-level representations of the format, omitting many aspects and focusing on the data in question.

For instance, here is the example we provide for the "word count" metadata in text documents (the word count gives an estimation of size of the textual content):

<newsMessage>
    <itemSet>
        <newsItem>
            <contentSet>
                <inlineXML wordcount="450">
                </inlineXML>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

As you can see, this example omits many elements: contrast it with the example of a complete document provided in section Document walkthrough. What you get from it, however, is a sense of where the word count information can be found and how it looks like.

Some examples contain XML comments. For example:

<!-- A subject represented by a QCode  -->
<subject qcode="medtop:20000273"/>

These comments won't appear in real documents, they are annotations specific to this documentation.

Common data

Some data may be present in multiple types of documents. For example, a creation date or an embargo instruction can appear in any document (text, picture, video, multimedia, animated graphic, still graphic). This section details these common data elements.

Creators & Contributors

Non-multimedia documents: creators and contributors may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <creator role="afpcrrol:writer">
                    <name>
                        John Doe
                    </name>
                </creator>
                <contributor role="afpctrol:editor afpctrol:validator">
                    <name>
                        Jeanne Dupont
                    </name>
                </contributor>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: creators and contributors to the multimedia document as a whole may be provided in the content metadata section of the main news item. Creators and contributors specific to an individual item may be provided in the content metadata section of that item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                         href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- The creators and contributors to the multimedia document as a whole -->
                <creator role="afpcrrol:writer afpctrol:forbyline">
                    <name>
                        John Doe
                    </name>
                </creator>
                <contributor role="afpctrol:forbyline">
                    <name>
                        Jeanne Dupont
                    </name>
                </contributor>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- The creators and contributors specific to this item -->
                <creator role="afpcrrol:photographer afpctrol:forbyline">
                    <name>
                        Paul Tergeist
                    </name>
                </creator>
                <contributor>
                    <name>
                        Annie Mall
                    </name>
                </contributor>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Creators and contributors may be provided by creator and contributor elements. Creators are persons who created the document or parts of the documents. Contributors are persons who modified or enhanced the document or parts of the documents. There might be any number of creators and contributors per news item.

For each creator and contributor we provide a name in the name element and optionally a list of roles, in the form of a QCode list, in the role attribute. The table below presents some roles often used in AFP documents.

Creator and contributor roles
Role Concept URI
Writer http://cv.afp.com/creatorroles/writer
Photographer http://cv.afp.com/creatorroles/photographer
Graphic designer http://cv.afp.com/creatorroles/graphicDesigner
For byline http://cv.afp.com/creatorroles/forbyline

Important: The "for byline" role has a special meaning: the names of creators and contributors without this role must not be published. You may use them for internal purpose such as contacting the journalist for questions, but you must not display them publicly in association with the content of the document.

Content warning

All documents: a content warning may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <signal qcode="QCode resolving to http://cv.iptc.org/newscodes/signal/cwarn"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A document may includes a warning about its content when it might be perceived offensive. In such case, you'll typically want to review the content of the document in order to decide how to use it. This warning takes the form of a signal element with a QCode resolving to http://cv.iptc.org/newscodes/signal/cwarn.

When a content warning is present, we often provide a set of exclAudience elements that convey the reason(s) for the content warning. For example, in a document whose content contains potentially offensive violence and language:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <signal qcode="QCode resolving to http://cv.iptc.org/newscodes/signal/cwarn"/>
            </itemMeta>
            <contentMeta>
                <exclAudience qcode="QCode resolving to 
                                     http://cv.iptc.org/newscodes/contentwarning/violence"/>
                <exclAudience qcode="QCode resolving to 
                                     http://cv.iptc.org/newscodes/contentwarning/language"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Used in this way, each exclAudience element identifies a characteristic of the content that might limit its audience (e.g. "violence"). We use the IPTC's content warnings vocabulary [IPTCCWarn], whose scheme is http://cv.iptc.org/newscodes/contentwarning/, to specify these limiting characteristics. At the time of this writing this include: death, language, nudity, sexuality, violence.

Correction signal

All documents: a correction signal may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
   <itemSet>
      <newsItem>
         <itemMeta>
            <signal qcode="QCode resolving to http://cv.iptc.org/newscodes/signal/correction"/>
         </itemMeta>
      </newsItem>
   </itemSet>
</newsMessage>

One particular type of update that can occur on a document is a correction. A correction occurs when an error has been found in a document and a corrected version is published. In such case, you receive a new version of the document (i.e., a document with the same guid an a new version number) that contains a correction signal. This signal takes the form of a signal element with a qcode attribute resolving to http://cv.iptc.org/newscodes/signal/correction.

Common practice at AFP is to use this mechanism only for corrections of great significance. For instance the correction of a typo that doesn't change the meaning of the news story shall not be marked as a correction but might be issued as a mere update.

When a serious error is found with a key information in a document, which renders it unusable as such, it will usually be canceled instead of corrected. A document is canceled by issuing a version with the "Canceled" publishing status, as discussed in section Publishing Status.

The correction signal doesn't provide details about the correction (e.g., what or where was the error, how it has been corrected). Such details will usually be provided in the general editorial note, which is given by an edNote element with a role attribute resolving to http://cv.afp.com/ednoteroles/client (see the section on the general editorial note). For example:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <signal qcode="QCode resolving to http://cv.iptc.org/newscodes/signal/correction"/>
                <edNote role="QCode resolving to http://cv.afp.com/ednoteroles/client">
                   In previous versions of this document, the first sentence of the answer of the
                   auctioneer was incorrectly translated. This has been corrected in this version.
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Handling a correction correctly is of paramount importance and can be a complex process (you probably have it in place already). For example, you may want to have someone review the document and, possibly with the help of the general editorial note and availability of previous versions, identify and understand what the error was and what the correction consists in. You may then want to ensure that this correction is applied to any published material that may have carried the original error. This may include making sure that recipients of such material are notified and provided with the corrected information.

Dates

Two dates formats are used in this specification:

In addition to the description provided below, you should refer to the NewsML-G2 specification for information on the processing model for these dates.

Document transmission date

All documents: the transmission date of the document is provided in the header of the news message.

<newsMessage>
    <header>
        <sent>2009-02-23T20:44:07+02:00</sent>
    </header>
</newsMessage>

The transmission date is provided by the sent element. It is always present and uses the full date and time format. The transmission date indicates when the document was transmitted from AFP to your system.

Document creation date

All documents: the creation date of the document may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

If present, the creation date of the document is provided by a firstCreated element in the full date and time format. The creation date of a document specifies when the NewsML-G2 document was created (contrast this with the content creation date, which specifies when some content was created; e.g., when a given photo was shot). When a new version of the document is emitted, the creation date of the document isn't modified, but the version creation date is.

Document version creation date

All documents: the creation date of this version of the document is provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The creation date of this version of the document is provided by a versionCreated element in the full date and time format. This date information is always present in documents.

Content creation date

Picture, video, still graphic, animated graphic documents: the creation date of the content may be provided in the content metadata section of the news item

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: the creation date of the content of a specific picture, video, still graphic or animated graphic item may be provided in the content metadata section of this item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- This is the content creation date of this item -->
                <contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This is the content creation date of this item -->
                <contentCreated>2009-02-22</contentCreated>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The creation date of a picture, video, still graphic or animated graphic content may be provided by a contentCreated element in the news item for this content. There is no content creation date provided for text content. In picture, video, still graphic and animated graphic documents, there is a single news item, which, consequently, is the one that may provide this information. For multimedia documents, the content creation date of each picture, video and graphic may appear in each corresponding news item, as shown in the examples above.

For a photo, the content creation date is the date of the shooting. Likewise, for live video footage, this is a date at which the covered event was occurring. For other type of content (e.g., video report, graphic) this is typically the date on which the content was produced.

Note that the content creation date uses the truncated date and time format.

Embargo

All documents: embargo information is provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <embargoed/>
                <edNote role="QCode resolving to http://cv.afp.com/ednoteroles/embargo">
                    Embargoed until end of first auction day
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Embargo information is specified through the embargoed element, which can be completed by an edNote element with a role attribute resolving to http://cv.afp.com/ednoteroles/embargo.

Embargo-wise, a document can have one of the four statuses described in the table below.

Embargo statuses
Embargoed Representation Example
No No embargoed element. N/A
Until given date and time An embargoed element providing the date and time at which the embargo ends. This can be completed by an editorial note providing additional comment but no additional embargo condition.
<embargoed>
    2009-02-23T21:00:00+02:00
</embargoed>
<edNote role="QCode resolving to 
              http://cv.afp.com/ednoteroles/embargo">
    Embargoed at the request of Christie's
</edNote>
Under other provided conditions An empty embargoed element and an embargo editorial note specifying the embargo conditions. This form is used when the precise date and time at which the embargo expires is not known. Note that if the conditions are made of a date and time and additional conditions, all these conditions are expressed in the editorial note (i.e., the date and time aren't provided inside the embargoed element, but as part of the editorial note too).
<embargoed/>
<edNote role="QCode resolving to 
              http://cv.afp.com/ednoteroles/embargo">
    Embargoed until end of first auction day
</edNote>
Indefinitely or specified by custom agreement An empty embargoed element. The document must be considered indefinitely embargoed unless a custom agreement between you and AFP is in effect.
<embargoed/>

See the NewsML-G2 specification for more information on the representation and processing model of embargo information.

For multimedia documents, the way embargo information is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its own embargo information, and a G2 item without an embargoed element is defined as not embargoed. In AFP's multimedia documents the only embargoed element to consider is those of the main item. The embargoed elements of non main items must be ignored. You must process multimedia documents in a way that applies embargo directives provided in the main news item to the entire content of the document (i.e., to all items in the document).

Event identifier

All documents: one event identifier may be provided by a subject element in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <subject qcode="QCode identifying the event" 
                         type="QCode resolving to http://cv.iptc.org/newscodes/cpnature/event">
                </subject>                
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The news coverage of an event often spans multiple NewsML-G2 documents. For example the auction for the Yves Saint Laurent and Pierre Bergé collection may be covered by a two news stories (one announcing the event and one reporting on the event later on), two interview transcripts (one with Pierre Bergé and one with a Christie's representative), a multimedia document, a video report and a number of pictures of the event. It might be interesting for you to know that all these documents are about the same event. For example, it might help your editorial team to access all the documents available about the event. Another example: if you operate a Web site publishing news you could use this knowledge to automatically provide links to related content.

To let you know that multiple NewsML-G2 documents relate to the same event, AFP creates unique event identifiers and insert them into documents. For example, an unique event identifier is assigned to the auction for the Yves Saint Laurent and Pierre Bergé collection, and each related document contains this identifier.


Different NewsML-G2 documents covering the same event

If present, the unique event identifier is the concept URI of the subject element whose type attribute, a QCode, resolves to http://cv.iptc.org/newscodes/cpnature/event. You compute this concept URI by resolving the QCode provided by the qcode attribute.

See the section on subjects for more information about the subject element.

Why is the event identifier provided using a <subject> element?

This is because the event covered by a document is a subject matter of the document : something the document is about. Hence, conveying its identifier using the NewsML-G2 <subject> element, along with other subjects of the documents, is appropriate. This allows it to be generically processed like any other subjects when that make sense, or to be processed specifically as the event identifier when needed, thanks to the type attribute which marks it as such.

General editorial note

All documents: a general editorial note may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <edNote role="QCode resolving to http://cv.afp.com/ednoteroles/client">
                    Original source is unknown and unverified. This photo was posted on twitter.
                    Following an official ban in San Theodoros on foreign media outlets covering
                    demonstrations, AFP is using pictures from other sources.
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The general editorial note provides some text in natural language addressed to the editorial people in your team receiving and processing the Item. It can provide instructions or hints on how to handle the document, information about the nature of a correction (see example in the section on correction signal), excluded audience/usage, additional information about the content, etc. It is not intended for publication.

There is at most one general editorial note in a document. If present, it is provided by an edNote element whose role attribute, a QCode, resolves to http://cv.afp.com/ednoteroles/client. Note that while NewsML-G2 allows for rich text by using some markup in the content of an editorial note, AFP's systems only output simple textual content not interspersed with markup.

The general editorial note is often used to express usage restrictions, as in the following example:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <edNote role="QCode resolving to http://cv.afp.com/ednoteroles/client">
                    EDITORIAL USE ONLY
                    NO MARKETING NO ADVERTISING CAMPAIGNS
                    NO ARCHIVE
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The following table provides examples of common usage restrictions you might find in pictures documents.

Examples of usage restrictions conveyed by the general editorial note
Phrase inside the general editorial note Comment
RESTRICTED TO EDITORIAL USE The picture can be used only by media outlets for news purposes (newspapers, magazines, radios, TVs, news websites and mobile news services...)
NO MARKETING NO ADVERTISING CAMPAIGNS The picture cannot be used for advertising or marketing.
NO INTERNET The picture cannot be published on Internet websites.
NO MOBILE The picture cannot be used by mobile services.
NO ARCHIVE The picture cannot be archived.
MANDATORY USE WITH AFP STORY The handout picture shall be published with the corresponding AFP story only (this mention is only available for handouts).
TO BE USED WITHIN XX DAYS FROM XX/XX/XXXX The picture cannot be used outside of the specified timeframe.
NO VIDEO EMULATION The picture cannot be used in a sequence of pictures to simulate a video.

Genres

Non-multimedia documents: genres of the document may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A genre represented by a QCode and associated with a rank -->
                <genre rank="1" qcode="genre:Interview"/>
                
                <!-- A genre represented by a QCode and a name and associated with a rank -->
                <genre rank="2" qcode="afpedtype:VideoWithTitling">
                    <name>Titling</name>
                </genre>  
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: genres of the document as a whole may be provided in the content metadata section of the main news item. Genres specific to a non-main item may be provided by the content metadata section of this item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- This genre is in the main news item:
                     it applies to the document as a whole -->
                <genre rank="1" qcode="genre:Interview">
                    <name>Interview</name>
                </genre> 
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This genre only qualifies this item -->
                <genre rank="1" qcode="genre:Profile">
                    <name>Profile</name>
                </genre> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Genres of a document, and of individual item in the case of multimedia documents, may be provided by genre elements. Each genre element describes the style of the content (i.e., the intellectual or journalistic form). There may be multiple genre elements per item, as a given item may be at the intersection of multiple genres.

In AFP documents, a genre is specified by an URI expressed as a QCode, optionally completed by a natural language name and/or a rank.

The URI space used to specify genres through QCodes is open and can evolve over time. Often used in AFP documents are URIs identifying IPTC genres [IPTCGenres], a standard taxonomy for categorizing news content. They are provided through QCodes, as shown above where "genre" is, in these specific examples, an alias for the scheme URI http://cv.iptc.org/newscodes/genre/. AFP also uses an additional set of genres, provided by the schemes http://ref.afp.com/attributes/ and http://ref.afp.com/editorialtypes/. Elements of this schemes are described in dedicated documents.

The name child element, if present, provides a natural language name for the genre.

Genres pertaining to a given news item may be ranked together by relative importance with regard to this particular item's content. This ranking is specified by a rank attribute, whose value is a nonnegative integer. In a given item, genres with a lower value for this attribute have a higher importance (i.e., editorial significance) than genres with a higher value of this attribute, and genres without a rank attribute have a lower importance than genres with a rank attribute. The NewsML-G2 specification provides additional information on ranks and their processing model.

In AFP documents there is at most one genre of rank of 1 per news item. We call this genre of rank 1 the "attribute". It is usually either an IPTC genre [IPTCGenres] or a genre defined in the additional AFP scheme http://ref.afp.com/attributes/ (this specific vocabulary is described in a dedicated document). Examples of attribute include : interview, curtain raiser, anniversary, verbatim, etc.

Genre of rank 2 are often present in AFP document. We call such genres "editorial types". Contrarily to genres of rank 1, multiples editorial types might be associated to a given news item. Editorial types are usually part of the controlled vocabulary defined by the scheme http://ref.afp.com/editorialtypes/. This vocabulary is described in a dedicated document. Examples of editorial types include: prev, exclusive, archive, video with titling, lead, second lead, third lead, etc.

Identifier and version number

All documents: the document identifier is provided in the news item (for multimedia documents: in the main news item). A version number may be present too.

<newsMessage>
    <itemSet>
        <newsItem guid="urn:newsml:afp.com:20100101:ca81bfb7-80be-4b79-b017-09cfe1f293ab" 
                  version="5">
        </newsItem>
    </itemSet>
</newsMessage>

A document is a set of information carrying some journalistic content and associated meta data. As news stories develop or corrections are made, new versions of the document are published.

Each NewsML-G2 document has a global unique identifier (guid), which is provided by the guid attribute of a newsItem element. It is designed to be globally unique among all NewsML-G2 documents for all time and all NewsML-G2 providers. This identifier makes it possible to identify a document as it moves through the news workflow and is transferred/duplicated from place to place and from system to system. It is also used as a basis to a document updating : an update is carried on by sending you a new version of a document identified by a given guid (i.e., the original and the new version share the same guid).

From a technical point of view, given two representations of some journalistic content in NewsML-G2, the guid is what tells whether these two representations are those of the same document (possibly different versions of it): same guids means same document, different guids means different documents.

In NewsML-G2 documents, guids are in the form of IRIs [RFC3987]. In current AFP's NewsML-G2 documents, those IRIs are either URNs in the namespace "newsml" [RFC3085bis] or URIs in the http scheme [HTTPURI]. Section 5.3 of [RFC3987] provides directions on how to determine whether two guids are the same.

A version number may be provided by a version attribute in the form of an XML Schema positive integer. It identifies the version of the document. The first time you receive a given document (i.e., a document identified by a given guid), this document isn't necessarily in its first version. That is, the version number of a document you receive for the first time may be greater than 1. The version number is incremented by 1 or more each time the document is updated. If no version attribute is present, you must assume that the document is in version 1 (i.e., first version).

How a new version of a document should be dealt with?

The answer is given by the NewsML-G2 documentation:
In the absence of any specific instructions from the provider, a "usable" item [cf. section on publishing status] should be regarded as replacing any previous version of the item with the same GUID. In practice, a provider is likely to provide some supplementary information in the form of a human-readable <edNote> [cf. section on general editorial note] which can be displayed to inform recipients of the reason for the update.
Often, new versions are issued to enrich previous ones with additional information, especially as stories develop in real time. Sometimes, however, a new version is meant to correct some error found in a previous version. In such case you may want to take some additional actions, as it might be the case that erroneous material has been published. Such correction-conveying versions are specifically tagged using a correction <signal>. For more information on this topic see the section on correction signal.

Information sources

Non-multimedia documents: information sources may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- An information source represented by a name  -->
                <infoSource>
                    <name>Governmental source</name>
                </infoSource> 
                
                <!-- An information source represented by a name and a role  -->
                <infoSource role="isrol:originfo">
                    <name>Governmental source</name>
                </infoSource> 
            
                <!-- An information source represented by a QCode, a name and a role -->
                <infoSource qcode="nprov:BW" role="isrol:origcont">
                    <name>Business Wire</name>
                </infosource> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: information sources may be provided in the content metadata sections of the main news items. When an information source appears in a news item which is not the main one, it describes an information source for the content of this item. When an information source appears in the main news item, it should be considered as an information source of the "document", with no indication of the specific part of the content it is associated with (if any).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- This information source is in the main news item: 
                      it is an information source of the document -->
                <infoSource>
                    <name>Governmental source</name>
                </infoSource> 
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This information source is specific to this item -->
                <infoSource qcode="nprov:BW" role="isrol:origcont">
                    <name>Business Wire</name>
                </infoSource> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

As defined by NewsML-G2, an information source is "a party (person or organization) which originated, distributed, aggregated or supplied the content or provided some information used to create or enhance the content."

Information sources of a document, and of individual items in multimedia documents, may be provided by infoSource elements.

In AFP documents, an information source is specified by either:

The URI space used to specify information source through QCodes elements is open and can evolve over time.

The name child element, if present, provides a natural language name for the information source.

The role attribute, if present, carries a QCode that specifies the role of the information source. AFP's documents use roles described in the table below, where the "Concept URI" column gives the URI the QCode resolves to.

Information source roles
Role Description Concept URI
Content originator A party which originated the content. For example, in a document created by AFP but reusing content provided by Business Wire, this source (i.e., Business Wire) will appear with this role. http://cv.iptc.org/newscodes/infosourcerole/origcont
Information originator A party which provided some information used to create or enhance the content. For example, a spokesperson, an interviewee, an "insider", etc. may appear with this role. http://cv.iptc.org/newscodes/infosourcerole/originfo

As specified by NewsML-G2, an infoSource with no role attribute should be considered as having the role of "Information originator".

Keywords

Non-multimedia documents: keywords may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <keyword>culture</keyword>
                <keyword>arts</keyword>
                <keyword>fashion</keyword>
                <keyword>auction<keyword>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: keywords of the document as a whole may be provided in the content metadata section of the main news item. Keywords specific to an individual item may be provided by the content metadata section of that item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- These keywords are in the main news item: 
                     they are associated with the document as a whole -->
                <keyword>culture</keyword>
                <keyword>arts</keyword>
                <keyword>fashion</keyword>
                <keyword>auction<keyword>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- These keywords are specifically associated with this news item -->
                <keyword>people</keyword>
                <keyword>money</keyword>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Keywords are defined by NewsML-G2 as "free-text terms to be used for indexing or finding the content by text-based search engines".

If present, keywords are provided by keyword elements.

Some keyword may have a refined role, expressed by a role attribute. The value of this attribute is a QCode. Currently we may issue a QCode resolving to http://cv.afp.com/keywordroles/tagWeb. For example:

<keyword role="QCode resolving to http://cv.afp.com/keywordroles/tagWeb">culture</keyword>

Keywords with a http://cv.afp.com/keywordroles/tagWeb role are meant to be used to compute tag clouds [TagClouds].

Language of the content

Non-multimedia documents: the language of the content may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <language tag="en"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: the language of the content may be provided in the content metadata section of each news item.

<newsMessage>
    <itemSet>
        <!-- An item whose content is in english -->
        <newsItem>
            <contentMeta>
                <language tag="en"/>
            </contentMeta>
        </newsItem>
        
        <!-- An item whose content is in french -->
        <newsItem>
            <contentMeta>
                <language tag="fr"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The tag attribute of the language element carries a BCP 47 language tag [RFC5646] that specifies the main language of the content. The content is what is provided inline or linked to by the item set (itemSet element). For example, in text document this is the main language the textual content is written in, and in a video document this is typically the main language used in the soundtrack.

Language of metadata

Non-multimedia documents: the language of metadata is specified by the news item.

<newsMessage>
    <itemSet>
        <newsItem xml:lang="en">
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: the language of metadata is specified by each news item.

<newsMessage>
    <itemSet>
        <newsItem xml:lang="en">
        </newsItem>
        <newsItem xml:lang="en">
        </newsItem>
    </itemSet>
</newsMessage>

The xml:lang attribute of the newsItem element carries a BCP 47 language tag [RFC5646] that specifies the language of the metadata (e.g., titles, subject's names, caption, etc.) provided by the item.

In a multimedia document, this attribute has the same value in every new items of the document (i.e., in a given document, all items make use of the same language for metadata).

Locations

AFP's NewsML-G2 documents can convey information about locations. We establish a distinction between locations from which the content originates (e.g., the place where a news story was written) and locations that are subject matter of the content. These two types of location are conveyed using different means, as described in the following sections.

Locations from which the content originates

Non-multimedia documents: the locations from which the content originates are provided in the content metadata section of the news item (in the following example only one location is provided).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <located qcode="afplocation:6666" type="loctyp:City">
                    <name>Washington</name>
                    <related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
                        <name>District of Columbia</name>
                        <related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:206" type="loctyp:Country">
                        <name>Etats-Unis</name>
                        <related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
                    </related>
                    <geoAreaDetails>
                        <position latitude="38.89511" longitude="-77.03637"/>
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: for news items in the document, the locations from which the content of the item originates may be provided in the content metadata section of the item (in the following example only one location per item is provided).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- Location from which the content of the main news item originates -->
                <located qcode="afplocation:2500" type="loctyp:City">
                    <name>Paris</name>
                    <related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
                        <name>France</name>
                        <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" /> 
                    </related>
                    <geoAreaDetails>
                        <position latitude="48.85341" longitude="2.34121" /> 
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- Location from which the content of this news item originates -->
                <located qcode="afplocation:2613" type="loctyp:City">
                    <name>Marseille</name>
                    <related afp:href="http://ref.afp.com/locations/719"
                        qcode="afplocation:719" rel="skos:broader" type="loctyp:CountryArea">
                        <name>Bouches-du-Rhône</name>
                        <related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:67" type="loctyp:Country">
                        <name>France</name>
                        <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
                    </related>
                    <geoAreaDetails>
                        <position latitude="43.29695" longitude="5.38107"/>
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In our NewsML-G2 documents, located elements specify the geographical origin of the editorial content conveyed by the <contentSet> element of a news item: the text of a news story, the jpeg renditions of a picture document, etc. There is always at least one location provided per news item.

Note that locations from which the content originates are not necessarily the locations the content is about. For example a news story about an event taking place in Madrid may be written in Paris; in such case the city of Paris may be specified as the location from which the content originates. The locations the content is about are conveyed in another part of the document, as described in section "Locations that are subject matter of the document".

There are some subtleties about what "locations from which the content originates" means depending on the nature of the content; we discuss them in the table below. Note that the policy described here is specific to AFP. Other conventions might be in place at other news providers.

Policy used to specify the locations from which the content originates
Nature of content Policy
Text A location from which the content originates is usually a location (e.g., a city) where the text was written or from which it was dictated. Alternatively it might be a location the content relates to if an AFP reporter is present nearby. Multiple locations may be provided in the form of multiple location elements when the content originates (as defined here) from multiples locations; in this case the usual practice is to provide no more than two locations.
Picture The location from which the content originates is the location of the camera when the picture was shot. Therefore it may differ from the location of what is shown in the picture. Knowing the location of the camera is useful as it lets one know "how the subject of the picture looks like when viewed from that location". Only one location is provided.
Video The location from which the content originates is the location of the camera when the video was recorded. Therefore it may differ from the location of what is shown in the video. Knowing the location of the camera is useful as it lets one know "how the subject of the video looks like when viewed from that location". Only one location is provided. If the video is shot in different places, only one of these places is provided, usually the most significant.
Still or animated graphic The location from which the content originates is the location where the graphic was created. Only one location is provided.
Multimedia Each news item in a multimedia document specifies the location(s) from which its content originate. The exact meaning for each news item is determined by the nature of its content as described in this table.

The locations from which the content originates are provided by located elements in the content metadata section of news items. A given located element may convey several informations about a location:

In text documents or text components of multimedia documents we may provide multiple locations from which the content originates. In this case the current practice being to provide at most two. Below is an example:

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A location from wich the content originates -->
                <located qcode="afplocation:2500" type="loctyp:City">
                    <name>Paris</name>
                    <related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
                        <name>France</name>
                        <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" /> 
                    </related>
                    <geoAreaDetails>
                        <position latitude="48.85341" longitude="2.34121" /> 
                    </geoAreaDetails>
                </located>
                
                <!-- Another location from wich the content originates -->
                <located qcode="afplocation:6666" type="loctyp:City">
                    <name>Washington</name>
                    <related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
                        <name>District of Columbia</name>
                        <related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:206" type="loctyp:Country">
                        <name>Etats-Unis</name>
                        <related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
                    </related>
                    <geoAreaDetails>
                        <position latitude="38.89511" longitude="-77.03637"/>
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>
When are multiple locations provided?

Multiple locations may be provided when the content originates from multiple locations. For example, suppose that we publish a story about the Bergé/Saint-Laurent auction. To write this story we might use informations provided by an AFP reporter present at the auction in Paris and by another AFP reporter present at a press conference given by Pierre Bergé at the same time in Washington. In this case we might provide Paris and Washington in located elements. Alternatively we might choose to provide the location where the story is actually written (say, e.g. Berlin) instead of Paris and Washington.

Locations that are subject matter of the document

All documents: locations the document is about may be provided in the news item (for multimedia documents: in the main news item) in the content metadata section, along with additional information in assertions.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
            
                <!-- The city of Beijing is a subject of the content -->
                <subject qcode="afplocation:2618" type="cpnat:geoArea">
                    <name>Beijing</name>
                </subject> 

                <!-- The city of Paris is a subject of the content and is a location of the event the content is about -->
                <subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
                    <name>Paris</name>
                </subject>
                
            </contentMeta>

            <!-- This assertion provides additional information about Beijing  -->
            <assert qcode="afplocation:2618">
                <type qcode="loctyp:City"/>
                <geoAreaDetails>
                    <position latitude="39.9075" longitude="116.39723"/>
                </geoAreaDetails>
            </assert>
            
            <!-- This assertion provides additional information about Paris  -->
            <assert qcode="afplocation:2500">
                <type qcode="loctyp:City"/>
                <broader qcode="afplocation:67" type="loctyp:Country">
                    <name>France</name>
                    <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
                </broader>
                <geoAreaDetails>
                    <position latitude="48.85341" longitude="2.3488"/>
                </geoAreaDetails>
            </assert>
            
        </newsItem>
    </itemSet>
</newsMessage>

Locations that are subject matter of the document may be provided by subject elements. Note that other entities such as persons, media topics, organizations and so on may also be conveyed using subject elements. To differentiate them, the subjects elements that convey these locations are identified with a type attribute whose value, a QCode, resolves to either http://cv.iptc.org/newscodes/cpnature/geoArea or http://cv.iptc.org/newscodes/cpnature/poi. All these subjects share some common properties, such as optional how and why attributes (not shown in the examples above) that are described in the section on subjects.

Additional information about these locations may be provided by assertions; an assertion is represented by an assert element. You can correlate assertions with specific locations using their QCodes: the information provided by an assertion applies to the location whose concept URI is conveyed by the QCode of the assertion. In our example above, we have a subject element whose QCode resolves to http://ref.afp.com/locations/2618 (for illustration purpose we suppose that in this document afplocation is a scheme alias for http://ref.afp.com/locations/). We also have an assert element whose QCode resolves to http://ref.afp.com/locations/2618. It means that both this subject and this assertion convey information about the same location.

A given assertion may convey several informations about a location:

Locations of the event(s)

Some locations that are subject matter of the document also happen to be locations of the event(s). The locations of the event(s) are the places where the event(s) related (or foreseen) in the document happen(s). They are provided by subject elements with an attribute role in namespace http://www.afp.com/format/internal/ equal to http://cv.afp.com/subjectroles/locationOfEvent.

For example, in our document about the auction of the Pierre Bergé and Yves Saint-Laurent collection, we could have the city of Paris as a subject because the news story mentions that the auction takes place in Paris. We could also have the city of Beijing as a subject because the news story mentions China's claims that some objects in the auction were stolen in Beijing during the opium wars and therefore should be returned. In this case, both cities would appear in dedicated subject elements. The city of Paris could be tagged as being a location of the event using the role attribute because the auction happens in Paris and in our example the auction is the event the story is about. Beijing would not be tagged as being a location of the event because while it is a subject of the story it is not a location of the event the story is about.

There is no default value for the role attribute: if a subject element conveying a location does not have a role attribute with a value of http://cv.afp.com/subjectroles/locationOfEvent, it doesn't mean that it isn't a location of the event, but merely that the information regarding this matter isn't provided by the element.

Provider

All documents: the provider of the document is given in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <provider qcode="afpprovider:AFP-TV">
                    <name>AFP-TV</name>
                    <broader qcode="nprov:AFP"/>
                </provider>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The provider of a document is the party responsible for the management and the release of the document (i.e., the publisher of the document). It is given by the qcode attribute of the provider element. This element is always present. The QCode resolves to an element of one of the following schemes:

The name child element, if present, provides a natural language name for the provider.

The broader child element, if present, specifies a larger entity the provider is part of. This entity is identified by a qcode attribute. In the example above, the document is provided by AFP-TV, a service inside AFP. The fact that this provider is part of AFP is expressed using the broader element. In this example, "afpprovider" is the alias of the scheme http://ref.afp.com/providers/ and "nprov" is the alias of the scheme http://cv.iptc.org/newscodes/newsprovider/.

Publishing Status

All documents: the publishing status is provided by the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ 
                                  specifying the publishing status"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A document can be usable, withheld or canceled. The table below describes how this is specified in documents and what it means.

Publishing statuses
Status Representation Meaning
Usable No pubStatus element or a pubStatus element with a qcode attribute resolving to http://cv.iptc.org/newscodes/pubstatusg2/usable The document is usable. Note that "usable" does not necessarily means "publishable"; for example an embargo may prevent publication of an otherwise usable document.
Withheld A pubStatus element with a qcode attribute resolving to http://cv.iptc.org/newscodes/pubstatusg2/withheld The document must not be used until further notice. Any usage of the document must be prohibited, if needed by the way of alerts. If previous versions of the document have been published they must be rendered inaccessible, whenever possible, until further notice. People that may have used previous versions should be notified, whenever possible, that there might be a problem with the document. This status is typically used when a serious problem with a document is suspected and is under investigation (e.g., important information in the document is suspected to be false).
Canceled A pubStatus element with a qcode attribute resolving to http://cv.iptc.org/newscodes/pubstatusg2/canceled The document must not be used, ever. Any usage of the document must be prohibited, if needed by the way of alerts. If previous versions of the document have been published they must be rendered inaccessible, whenever possible. People that may have used previous versions should be notified, whenever possible, that there is a problem with the document. This status is typically used when a serious problem with a document is detected (e.g., important information in the document is found to be false) and the scope of the problem is wide enough to warrant a complete kill of the document instead of issuing a correction.

The NewsML-G2 specification provides detailed information on how you must make use of this publishing status when processing documents.

For multimedia documents, the way publishing status is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its own publishing status, and a G2 item without a pubStatus element is defined as usable. In AFP's multimedia documents the only pubStatus element to consider is those of the main item. The pubStatus elements of non main items must be ignored. You must process multimedia documents in a way that applies the publishing status provided in the main news item to the entire content of the document (i.e., to all items in the document).

Subjects

Non-multimedia documents: subjects of the document may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A subject represented by a natural language name  -->
                <subject>
                    <name>auction</name>
                </subject> 

                <!-- A subject represented by a QCode  -->
                <subject qcode="medtop:20000273"/>
                                
                <!-- A subject represented by a QCode and a natural language name  -->
                <subject qcode="medtop:01000000">
                    <name>arts, culture and entertainment</name>
                </subject> 

                <!-- A subject represented by a QCode, a natural language name, a type, a why, a how and a role -->
                <subject qcode="afplocation:2500" type="cpnat:geoArea" why="why:direct"
                         how="howextr:person" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
                    <name>Paris</name>
                </subject> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: subjects of the document as a whole may be provided in the content metadata section of the main news item. Subjects specific to an item may be provided in the content metadata section of this item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- This subject is in the main news item: 
                     it applies to the document as a whole -->
                <subject qcode="medtop:20000031" type="cpnat:abstract">
                    <name>visual art</name>
                </subject> 
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This subject only applies to this news item -->
                <subject qcode="medtop:20000011" type="cpnat:abstract">
                    <name>fashion</name>
                </subject> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Subjects are important topics of the content; what the content is about. Some subjects of a document (and of individual items in the case of multimedia documents) may be provided by subject elements. Each subject element contains an indication on what the document's content (or item's content) is about.

Some subjects of the document may be described by keyword elements instead of subject elements. However, keywords may also be used for other purposes: while a keyword may describe a subject of the document, not all keywords do. See the Keywords section.

In AFP documents, a subject represented by a subject element is specified by either:

The URI space used to specify subjects through QCodes is open and can evolve over time. Often used in AFP documents are URIs identifying IPTC media topics [IPTCMediaTopics], a standard taxonomy for categorizing news content. They are provided through QCodes, as shown above where "medtop" is, in these specific examples, an alias for the scheme URI http://cv.iptc.org/newscodes/mediatopic/. Also often used are URI identifying events, in order to associate a document with the event it covers; for more on this topic see the section on the event identifier. The table below present common schemes used in AFP documents do identify subjects; note that it isn't exhaustive.

Common types of subjects used in AFP documents
Type Scheme URI(s) Comment
Media topics http://cv.iptc.org/newscodes/mediatopic/ Media topics is a standard IPTC taxonomy for categorizing news content. For example the concept URI http://cv.iptc.org/newscodes/mediatopic/01000000 identifies the category "arts, culture and entertainment", which is defined as "Matters pertaining to the advancement and refinement of the human mind, of interests, skills, tastes and emotions".
Events http://eventmanager.afp.com/events/ An AFP specific scheme for identifying events. It is used to associate a document with the event it covers. For more on this topic see the section on the event identifier.
Persons http://ref.afp.com/persons/ AFP specific scheme for identifying persons. For example the concept URI http://ref.afp.com/persons/193573 identifies Pierre Bergé.
Organizations http://ref.afp.com/organizations/ AFP specific scheme for identifying organizations. For example the concept URI http://ref.afp.com/organizations/5308 identifies Christie's, the auction company.
Locations http://ref.afp.com/locations/

or

geo:
http://ref.afp.com/locations/ is an AFP specific scheme for identifying locations. For example the concept URI http://ref.afp.com/locations/2500 identifies the city of Paris.

geo: is the scheme for 'geo' URIs. Geo URIs are defined by [RFC5870]. They allows identifying locations and conveying information such as latitude, longitude and so on. For example the URI geo:13.4125,103.8667 identifies the location at latitude 13.4125 and longitude 103.8667 in WGS-84. At the time of this writing AFP documents make use of simple geo URIs with only latitude and longitude, but in the future we may use additional features (e.g., altitude, uncertainty, etc.)

A subject element can have a name child element. If present it provides a natural language name for the subject.

In a given item, the order of appearance of subject elements provides a hint about their relative importance (i.e., editorial significance) in the context of this item: a subject should be considered as having either the same or a lesser importance than subjects appearing before in the item. Note that while AFP's documents currently don't rank subjects with rank attributes, that may change in the future. In order to be forward compatible, if your NewsML-G2 processor interprets such ranks, the relative importance they convey should take precedence over the relative importance conveyed by the order of appearance of subjects elements in the item. The rank attribute is described in the NewsML-G2 specification.

Optional attributes (these attributes may or may not be present in a given subject element):

type: this attribute carries a QCode that specifies the type of the subject (i.e., person, organization, event, abstract concept, etc.). The value space for this attribute is open, but in AFP documents you'll typically find types defined in the standard IPTC "Nature of a concept" controlled vocabulary [IPTCCPNatures].

why: this attribute carries a QCode that gives an indication on why the subject element is present. The following table shows possible values and their meaning. The "Concept URI" column indicates the URI to which the why attribute resolves.

Value space for the "why" attribute
Concept URI Meaning
http://cv.iptc.org/newscodes/whypresent/direct The subject metadata has been directly extracted from the content (by a tool and/or a person).
http://cv.iptc.org/newscodes/whypresent/ancestor The subject metadata has been inherited from another concept associated with the content (typically, another subject: e.g., the item might have inherited the subject "Arts and entertainment" because it has been associated with the subject "Sculpture", and "Sculpture" has "Arts and entertainment" as an ancestor).
http://cv.iptc.org/newscodes/whypresent/inferred The subject metadata has been derived by look-up in a thesaurus. For example, the entity "Christie's" may be associated with the subject "Auction".

how: this attribute carries a QCode that explain the means by which the subject metadata was extracted from the content. The following table shows possible values and their meaning. The "Concept URI" column indicates the URI to which the how attribute resolves.

Value space for the "how" attribute
Concept URI Meaning
http://cv.iptc.org/newscodes/howextracted/person The subject metadata has been extracted by a person.
http://cv.iptc.org/newscodes/howextracted/assisted The subject metadata has been extracted by a person assisted by a tool.
http://cv.iptc.org/newscodes/howextracted/tool The subject metadata has been extracted by a tool.

When the why attribute is present and is not http://cv.iptc.org/newscodes/whypresent/direct, the how attribute isn't present and has an implicit value of http://cv.iptc.org/newscodes/howextracted/tool.

role (in namespace http://www.afp.com/format/internal/): some subjects have a specific role, which is conveyed by this attribute in the form of an URI. This attribute is not defined by the NewsML-G2 standard: it is a AFP specific extension and is therefore defined in a specific namespace.

Currently the only possible value for this attribute when it is present is http://cv.afp.com/subjectroles/locationOfEvent. If a subject is tagged with this role then this subject is a location of the event(s) the editorial content is about. This usage is described in detail in the section "Locations that are subject matter of the document".

Titles & subtitles

Documents may contain various types of titles and multiple levels of subtitles.

Note that while NewsML-G2 allows for rich text by using some markup in the content of titles and subtitles, AFP's systems only output simple textual content not interspersed with markup.

Titles

All documents: titles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- The main title of the document -->
                <headline>
                    YSL-Bergé collection sets new world record at auction 
                    for a private collection
                </headline>
                
                <!-- The short title of the document -->
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/shorttitle">
                    YSL-Bergé collection: a new record at auction
                </headline>
                
                <!-- The long title of the document -->
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/longtitle">
                    Yves Saint Laurent/Pierre Bergé collection sets new world record at 
                    auction for a private collection with more than 206 million euros
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Documents may contain a title, a short title and a long title. These titles, if present, are provided by headline elements located in the content metadata section of the first news item. There is at most one title, one short title and one long title.

You can determine the type of a given title by looking for the presence and value of a role attribute, as described in the following table.

Title types
Type Function Identification
Title The main title of the document: a short summary of the journalistic content. No role attribute.
Short title A shorter version of the title, suitable for displaying on space constrained surfaces (e.g., mobile handsets). A role attribute with a QCode value resolving to http://cv.afp.com/headlineroles/shorttitle
Long title A longer version of the title. This is a short catch line, useful, for example, to display on a banner. A role attribute with a QCode value resolving to http://cv.afp.com/headlineroles/longtitle

Subtitles

All documents: subtitles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/subtitle"
                          rank=0>
                    Auction to continue tuesday and wednesday  
                </headline>
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/subtitle"
                          rank=1>
                    Prestigious attendance noted on first day
                </headline>
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/subtitle"
                          rank=2>
                    Pierre Bergé speaks about the origin of the collection
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In addition to titles, documents may contain subtitles. Subtitles complement tittles with additional information about the news content of the document. Like titles, they are provided by headline elements in the content metadata section of the main news item. Their subtitle nature is denoted by a role attribute whose value is a QCode resolving to http://cv.afp.com/headlineroles/subtitle. A rank attribute may be present to specify the relative importance of subtitles. Ranks are nonnegative integers. Subtitles with a lower value for this attribute have a higher importance than subtitles with a higher value of this attribute, and subtitles without a rank attribute have a lower importance than subtitles with a rank attribute. See the NewsML-G2 specification for additional information on ranks and their processing model.

Type of document

An AFP document can be of one of the following types:

To determine the type of a document, you first need to determine if it is a multimedia or non-multimedia document. A document is multimedia if the item set of the news message contains a news item whose item metadata section contains a link element with both:

That is, a multimedia document contains the following:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In a non-multimedia document, the type is the item class of the news item. The item class is given by the qcode attribute of the itemClass element in the item metadata section, as shown here:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <itemClass qcode="QCode of scheme http://cv.iptc.org/newscodes/ninature/ 
                                  specifying the type"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The itemClass element is always present. AFP's documents use QCodes described in the table below, where the "Concept URI" column gives the URI the QCode resolves to.

Item classes used in AFP document
Type Description Concept URI
Text The content is textual http://cv.iptc.org/newscodes/ninature/text
Picture The content is a (non-animated) visual representation of a physical scene. It is typically a photo, but can also be produced by other means: for example, it can be an artist drawing representing a physical scene (e.g., a courtroom drawing). http://cv.iptc.org/newscodes/ninature/picture
Video The content is a moving visual representation of a physical scene. It is typically digital video material, but can also be produced by other means: for example, it can be a 3D computer generated sequence or an animated artists drawing, provided they represent physical scenes. http://cv.iptc.org/newscodes/ninature/video
Still graphic The content is a (non animated) symbolic visual representation. It often includes text labels. For example, it can be charts, diagrams, maps, company logos, etc. http://cv.iptc.org/newscodes/ninature/graphic
Animated graphic The content is an animated symbolic visual representation. It often includes text labels. For example, it can be animated charts, diagrams, maps, company logos, etc. http://cv.iptc.org/newscodes/ninature/animated

Urgency

All documents: the urgency of the document may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <urgency>1</urgency>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A document may include an indication of the editorial urgency of its content in an urgency element. The content of this element is an integer from 1 (highest urgency) to 9 (lowest urgency). Usually, AFP documents are tagged with urgencies from 1 to 4.

There is often a correlation between the value of this property and the role in workflow of the document. In documents produced by AFP, flashes are typically issued with the highest urgency (i.e., a value of 1) alerts with an urgency of 2 and urgents with an urgency of 3.

Data specific to text and multimedia documents

Some data appear only in text and multimedia documents. This section details these data elements.

Catchline

Text documents: a catchline may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/introduction">
                    The Yves Saint Laurent and Pierre Bergé collection sets new world record at  
                    auction for a private collection on monday, the first day of a three action  
                    days, with more than 206 million euros. Participants describe first day
                    as "surprising, moving, electric!".
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a catchline may be provided in the content metadata section of the main news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <headline role="QCode resolving to http://cv.afp.com/headlineroles/catchline">
                    The Yves Saint Laurent and Pierre Bergé collection sets new world record at  
                    auction for a private collection on monday, the first day of a three action  
                    days, with more than 206 million euros. Participants describe first day
                    as "surprising, moving, electric!".
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A catch line, if present, provides a clear and concise summary of the story that tells the reader what has happened in simple language. It is designed to arouse or call viewer's attention. It gives an overview of all the main elements of the news. A catchline may be found at most once per document.

In multimedia documents the catchline is provided by a headline element whose role attribute, a QCode, resolves to http://cv.afp.com/headlineroles/catchline.

In text documents the catchline is provided by a headline element whose role attribute, a QCode, resolves to http://cv.afp.com/headlineroles/introduction (note that this differs from multimedia documents). At the time of this writing a catchline may be provided only for text documents produced by SID (Sport-Informations-Dienst), an AFP subsidiary. To determine if the kind of text documents you are interested in might contain a catchline you are advised to discuss the matter with your AFP representative.

While NewsML-G2 allows for rich text by using some markup in the content of a catch line, AFP's systems only output simple textual content not interspersed with markup.

In some documents you might observe that the content of the catchline is the same as the first paragraph of the main textual content of the document. Note however that this is not always the case and that sometimes an original catchline is provided.

Number of hypertext links to external resources present in textual content

Text documents and multimedia documents: the number of hypertext links to external resources present in textual content may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
    <itemSet>
        <newsItem>
            <itemMeta>
                <afp:extension>
                    <afp:stats>
                        <afp:totalLinks>
                            3
                        </afp:totalLinks>
                    </afp:stats>
                </afp:extension>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The XHTML rendition of the textual content of the news item (the main news item for multimedia documents) can contain hypertext links to external resources, typically conveyed by <a> elements. External resources are resources that are not intrinsically part of the document; for example, in a multimedia document a link to one of the item of the document isn't a link to an external resource whereas a link to a Wikipedia page is.

As shown in the example above the number of hypertext links to external resources present in textual content may be provided as an non negative integer by a totalLinks element inside a stats element inside an extension element in the item metadata section of the (main) news item.

Note that the totalLinks, stats and extension elements are not standard NewsML-G2 vocabulary but part of an AFP's specific extension. They are defined in an XML namespace whose name is "http://www.afp.com/format/internal/".

Word count

Text and multimedia documents: the word count is provided in the inline XML rendition of the content of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentSet>
                <inlineXML wordcount="450">
                </inlineXML>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

The word count gives an approximation of the size of the textual content of the document (not including textual content provided in metadata). That size is provided as an approximative count of words: when it is computed, each individual word might not count for one as short words count for less than one and long words count for more than one.

The word count is provided by the wordcount attribute of the inlineXML element of the news item. It is a non-negative integer. It is present in all text and multimedia documents.

Data specific to text documents

Some data is specific to text documents. This section details these data elements.

Role in workflow

Text documents: a role in workflow may be provided in the item metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <role qcode="QCode specifying the role in workflow"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Some text documents carry an indication of their role in workflow (aka editorial role). This allows you to handle them in specific ways. This role, if present, is specified by the qcode attribute of the role element. The possible values for the role are taken from a controlled vocabulary provided by the IPTC (we do not use its whole value space, though). They are described in the table below, where the Concept URI column gives the URI the QCode resolves to.

Roles in workflow
Role Description Concept URI
Flash A very short text on an event of exceptional importance.
Flashes are rare. For example, only four events were reported by AFP by a flash in 2008 : Kosovo’s declaration of independence; the opening of the Beijing Games; Russia’s recognition of South Ossetia and Abkhazia as independent states; and Barack Obama’s victory in the US presidential elections.
http://cv.iptc.org/newscodes/edrole/flash
Alert A very short text with high priority http://cv.iptc.org/newscodes/edrole/alert
Urgent A short text on a major development of a top story http://cv.iptc.org/newscodes/edrole/urgent
Lead A sum-up or a complete version of a developing story http://cv.iptc.org/newscodes/edrole/lead

When a document is updated, its role in workflow may be updated too. For example it is typical for a breaking news that deserves immediate diffusion to starts its life as an alert, then becomes an urgent, then a lead, as it gets refreshed/enriched with more content. Each version of the document share the same guid (see the section on identifiers).

Evolution over time of a developing story

Once a document is a lead, subsequent versions may be qualified as "second lead", "third lead" and so on up to a "ninth lead". However, this qualification is not done through the role in workflow property : this property use the same concept URI of http://cv.iptc.org/newscodes/edrole/lead from the first lead through the ninth one. To convey what kind of lead the document is, we use a <genre> element (see the section on genres). For example, we typically convey that a document is a first lead by specifying a role in workflow with the concept URI http://cv.iptc.org/newscodes/edrole/lead and a genre with the concept URI http://ref.afp.com/editorialtypes/Lead, as in the following example:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <role qcode="QCode resolving to http://cv.iptc.org/newscodes/edrole/lead"/>
            </itemMeta>
            <contentMeta>
                <genre qcode="QCode resolving to http://ref.afp.com/editorialtypes/Lead" />
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

For a second lead, the role in workflow is still http://cv.iptc.org/newscodes/edrole/lead and a genre with a concept URI of http://ref.afp.com/editorialtypes/2ndlead is provided:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <role qcode="QCode resolving to http://cv.iptc.org/newscodes/edrole/lead"/>
            </itemMeta>
            <contentMeta>
                <genre qcode="QCode resolving to http://ref.afp.com/editorialtypes/2ndlead" />
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A document with a role in workflow of "lead" can also be qualified by the genre "general lead", whose meaning is described at the end of the table below. Typically a general lead has a different guid than the various documents it may consolidate. A document cannot be both a general lead and "first lead", "second lead" etc.

The following table describes the various genres which can be used to qualify a lead.

Genres used to qualify a lead
        Genre         Description Concept URI
Lead (typically used to mean "first lead") A sum-up or a complete version of a developing story http://ref.afp.com/editorialtypes/Lead
Second lead A sum-up or a complete version of a developing story. For a given story, common usage is that a second lead is published only if a lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/2ndlead
Third lead A sum-up or a complete version of a developing story. For a given story, common usage is that a third lead is published only if a second lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/3rdlead
Fourth lead A sum-up or a complete version of a story. For a given story, common usage is that a fourth lead is published only if a third lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/4thlead
Fifth lead A sum-up or a complete version of a story. For a given story, common usage is that a fifth lead is published only if a fourth lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/5thlead
Sixth lead A sum-up or a complete version of a developing story. For a given story, common usage is that a sixth lead is published only if a fifth lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/6thlead
Seventh lead A sum-up or a complete version of a developing story. For a given story, common usage is that a seventh lead is published only if a sixth lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/7thlead
Eighth lead A sum-up or a complete version of a developing story. For a given story, common usage is that a eighth lead is published only if a seventh lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/8thlead
Ninth lead A sum-up or a complete version of a developing story. For a given story, common usage is that a ninth lead is published only if a eighth lead is already out. It provides a refreshed and/or enriched version of that story. http://ref.afp.com/editorialtypes/9thlead
General lead A large sum-up or a complete version of a story. A general lead regroups, hierarchizes and develops all available elements of a developing story, including elements that were previously published under a number of different documents, each one focusing on specific facets of the more general story. http://ref.afp.com/editorialtypes/LeadGeneral

Textual content

Text documents: the textual content is provided in the content set of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentSet>
                <inlineXML contenttype="application/xhtml+xml">
                    <html xmlns="http://www.w3.org/1999/xhtml">
                        <head>
                            <title>
                                YSL-Bergé collection sets new world record at auction 
                                for a private collection
                            </title>
                        </head>
                        <body>
                            <p>The Yves Saint Laurent and Pierre Bergé collection sets 
                            new world record at auction for a private collection. 
                            Hundreds of art treasures amassed by late fashion designer
                            Yves Saint Laurent and his companion Pierre Berge over half
                            a century are being auctioned.</p>
                            <p>Bids hit 206 million euros (261 million dollars) on February
                            23, 2009 making it the biggest private collection ever 
                            auctioned with two days of sales still left to run.</p>
                            ...
                            ...
                        </body>
                    </html>
                </inlineXML>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

The textual content of the document is the main journalistic text of the document. It is provided by an inlineXML element. It is always in XHTML format. This is explicitly denoted by a contentType attribute with a value of application/xhtml+xml.

Note that text items of multimedia documents can also contain similar data, but with additional information such as links to visual content. This is described in section "Data specific to multimedia documents".

Data specific to visual content

Some data is associated with visual content. It may be present in picture, video, still graphic and animated graphic documents. It may also be present in picture, video, still graphic and animated graphic items of multimedia documents. This section details these data elements.

Caption

Picture, video, still graphic, animated graphic documents: a caption may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="QCode resolving to 
                                   http://cv.iptc.org/newscodes/descriptionrole/caption">
                    French businessman and head of Sidaction organisation Pierre Berge
                    attends at Marigny theater in Paris, the first of the four auction
                    days led by Christie's of Yves Saint-Laurent and Pierre Berge 
                    collection, which profit will fund campaigns against HIV-AIDS.
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a caption may be provided in the content metadata section of each news item conveying picture, video, still graphic or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A caption for the content of this item -->
                <description role="QCode resolving to 
                                   http://cv.iptc.org/newscodes/descriptionrole/caption">
                    French businessman and head of Sidaction organisation Pierre Berge
                    attends at Marigny theater in Paris, the first of the four auction
                    days led by Christie's of Yves Saint-Laurent and Pierre Berge 
                    collection, which profit will fund campaigns against HIV-AIDS.
                </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- A caption for the content of this item -->
                <description role="QCode resolving to 
                                   http://cv.iptc.org/newscodes/descriptionrole/caption">
                    Christie's auctioneer François de Ricqles proceeds with the auction 
                    of a rabbit head, a Chinese imperial bronze part of a prized art 
                    collection assembled by Yves Saint Laurent and his partner Pierre 
                    Berge over half a century on February 25, 2009 at the Grand Palais 
                    in Paris. One of the world's great private collections, it takes
                    in masterpieces by Picasso, Mondrian and Matisse, old masters, Art
                    Deco gems, bronzes, enamels and antiques. Two looted Chinese bronzes
                    sold for 15.7 million euros (20.3 million dollars) each to anonymous
                    telephone bidders at the Yves Saint Laurent art sale on Wednesday, 
                    despite protests from Beijing.  
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Captions, if present, provide concise textual descriptions of news information shown in visual content. Captions can also describe the meaning and context of what is shown.

The caption of a picture, a video, a still graphic or an animated graphic may be provided in the news item for this content by a description element whose role attribute is a QCode resolving to http://cv.iptc.org/newscodes/descriptionrole/caption. There is no caption for text content. In picture, video, still graphic and animated graphic documents, there is a single news item, which, consequently, is the one that may provide a caption. For multimedia documents, the caption of each picture, video, still graphic and animated graphic may appear in each corresponding news item. There is at most one caption per picture, video or graphic content.

Note that while NewsML-G2 allows for rich text by using some markup in the content of a caption, AFP's systems only output simple textual content not interspersed with markup.

Visual content

Basic format

Picture, video, still graphic, animated graphic documents: one or multiple links to visual content may be provided in the content set of the news item.

<newsMessage>
    <itemSet>
        <!-- A visual item with three different renditions of the same visual content -->
        <newsItem>
            <contentSet>
                <remoteContent href="pictureItem/image1.jpg"/>
                <remoteContent href="pictureItem/image2.jpg"/>
                <remoteContent href="ftp://example.com/image3.gif"/>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: one or multiple links to visual content may be provided in the content set of each news item conveying picture, video, still graphic or animated graphic content.

<newsMessage>
    <itemSet>
        <!-- A visual item with three different renditions of the same visual content -->
        <newsItem>
            <contentSet>
                <remoteContent href="pictureItem/image1.jpg"/>
                <remoteContent href="pictureItem/image2.jpg"/>
                <remoteContent href="ftp://example.com/image3.gif"/>
            </contentSet>
        </newsItem>
        
        <!-- Another visual item with two rendition of some other visual content -->
        <newsItem>
            <contentSet>
                <remoteContent href="videoItem/video1.mp4"/>
                <remoteContent href="http://example.com/video2.mp4"/>
            </contentSet>
        </newsItem>        
    </itemSet>
</newsMessage>

Links to the actual visual content (e.g., bitmaps, vector graphics, video frames, etc.) are provided by href attributes of remoteContent elements. The value of each href attribute is an URI reference (while NewsML-G2 allows for IRI references, AFP NewsML-G2 documents use only URI references). See section "Accessing visual content through URI references" for additional directions on how to use these links.

Each picture, video, still graphic and animated graphic news item carries information one visual content (i.e., one picture, video or graphic). However, this content may be available in multiple renditions (e.g., low resolution, high resolution, JPEG format, TIFF format, etc.). Each rendition is described by a remoteContent element in the content set of the item.

In AFP picture documents, the definition of what is a rendition of a given content differs slightly from standard NewsML-G2. In standard NewsML-G2 "Each rendition [in the content set of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format. [Renditions in the content set of a given news item are] different technical representations of the same logical content". In AFP picture documents, in addition to providing different technical representations of the same logical content (as in standard NewsML-G2), renditions may also consists in crops of the content provided in other renditions of the same news item.

Additional properties of renditions

For each rendition, some information might be provided by attributes on remoteContent elements. These attributes are described below.

Rendition type

To aid selecting renditions, the type of a rendition may be provided by a rendition attribute in the remoteContent element describing the rendition, as in this example:

<!-- Three description of renditions of different types -->
<remoteContent rendition="rnd:lowRes"    href="pictureItem/image1.jpg"/>
<remoteContent rendition="rnd:highRes"   href="pictureItem/image2.jpg"/>
<remoteContent rendition="rnd:thumbnail" href="pictureItem/image3.gif"/>

The rendition attribute provides a QCode whose possible values are taken from a controlled vocabulary provided by the IPTC and from a controlled vocabulary provided by AFP. They are described in the following table, where the "Concept URI" column gives the URI the QCode resolves to.

Rendition types
Concept URI Description
http://cv.iptc.org/newscodes/rendition/thumbnail A very small rendition of an image, giving only a general idea of its content
http://cv.iptc.org/newscodes/rendition/preview Preview resolution image or video
http://cv.iptc.org/newscodes/rendition/lowRes Low resolution image or video
http://cv.iptc.org/newscodes/rendition/highRes High resolution image or video
http://cv.iptc.org/newscodes/rendition/print Content intended to appear in print
http://cv.iptc.org/newscodes/rendition/web Content intended to appear on a web page
http://cv.iptc.org/newscodes/rendition/mobile Content intended to appear on a mobile or handheld device
http://cv.afp.com/renditions/ipad Content intended to appear on iPad
http://cv.afp.com/renditions/squaredThumbnail A small squared rendition of an image
http://cv.afp.com/renditions/fullSize Documentation forthcoming
http://cv.afp.com/renditions/highDef Documentation forthcoming
Media type and format

The media type of a rendition may be provided by a contenttype attribute on the remoteContent element describing the rendition, as in this example:

<!-- Three description of renditions, each one with a media type -->
<remoteContent contenttype="image/jpeg" href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif"  href="pictureItem/image3.gif"/>

The value of the contenttype attribute is a IANA MIME media type name [MediaTypes].

The contenttype attribute may be complemented by a format attribute to refine information about the data format of the rendition. For example:

<!-- Three descriptions of renditions, each one with a media type complemented by a format -->
<remoteContent contenttype="image/jpeg" format="frmt:JPEG_Baseline"    
               href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" format="frmt:JPEG_Progressive" 
               href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif"  format="frmt:GIF87a"
               href="pictureItem/image3.gif"/>

The format attribute provides a QCode whose possible values are in the controlled vocabulary for formats defined by IPTC http://cv.iptc.org/newscodes/format/" target="_blank">[IPTCFormats].

Visual dimensions

The width and height of a rendition may be provided by width and height attributes (whose values are non-negative integers) on the remoteContent element describing the rendition. The units in which these dimensions are expressed may be provided by widthunit and heightunit attributes. These attributes provide QCodes whose possible values are in the controlled vocabulary for dimension units defined by IPTC [IPTCDimUnits]. For example:

<remoteContent width ="640" widthunit ="dimensionunit:pixels" 
               height="400" heightunit="dimensionunit:pixels" href="pictureItem/image1.jpg"/>

This fragment states that the visual content at images/image1.tiff is 640 pixels width and 400 pixels height (in this example, we suppose that dimensionunit is a scheme alias for the controlled vocabulary defined by IPTC for dimension units).

The possible dimension units are a subset of the IPTC dimension units controlled vocabulary. They are provided in the table below, where the "Concept URI" column gives the URI to which the heightunit and/or widthunit attributes resolves.

Dimension units
Unit Concept URI
Pixels http://cv.iptc.org/newscodes/dimensionunit/pixels
Points http://cv.iptc.org/newscodes/dimensionunit/points

If a width and/or a height attribute is present but the corresponding dimension unit attribute is missing, then you must assume that the width and/or height is expressed in the default unit for that dimension. The default dimension units, which are specified by NewsML-G2, are given in the table below.

Default dimension units
Type of visual content Default height unit Default width unit
Picture pixels pixels
Graphic (still or animated) points points
Digital video pixels pixels
Size

The size in bytes of a rendition may be provided by a size attribute on the remoteContent element describing the rendition, as in this example:

<remoteContent size="253476" href="pictureItem/image1.jpg"/>

In this example, the size attribute asserts that the representation of the resource identified by images/image1.tiff weight 253476 bytes.

The value of the size attribute is a non-negative integer.

Data specific to picture and still graphic content

Some data is only present in picture and still graphic documents, and in picture and still graphic items of multimedia documents. This section describes these data elements.

Note that picture and still graphic documents/items also contains data common to visual content (see section "Data specific to visual content") and, of course, data common to all kind of content (see section "Common data").

Additional data about visual content

As described in the section "Visual content", a given visual may have multiple renditions, each one described by a remoteContent element. This section describes additional data that may be used to describe a picture or still graphic rendition.

Orientation

The "orientation" of a rendition is an indication of orientation change from the original digital image. It may be provided by an orientation attribute on the remoteContent element describing the rendition. The value of this attribute is an integer in the range of 1 to 8 (inclusive). For example:

<remoteContent orientation="5" href="pictureItem/image1.jpg"/>

This fragment states that the image at pictureItem/image1.jpg has been flipped about the vertical axis and rotated 90 degrees counterclockwise with regard to the original image. See the NewsML-G2 specification for a comprehensive description of the meaning of each value.

If no orientation attribute is present, you should assume a value of 1, which means "upright, no flip, no rotation" (i.e., the visual top of the original image is at the top, the visual left side of the original image in on the left, etc.)

Data specific to video and animated graphic content

Some data is only present in video and animated graphic documents, and in video and and animated graphic items of multimedia documents. This section describes these data elements.

Note that video and animated graphic documents/items also contains data common to visual content (see section "Data specific to visual content") and, of course, data common to all kind of content (see section "Common data").

Additional data about visual content

As described in the section "Visual content", a given visual may have multiple renditions, each one described by a remoteContent element. This section describes additional data that may be used to describe a video and animated graphic rendition.

Duration

The duration of a rendition may be provided by a duration attribute (a non-negative integer) on the remoteContent element describing the rendition. The unit in which the duration is expressed may be provided by a durationunit attribute. This attribute provides a QCode whose possible values are in a subset of the controlled vocabulary for time units defined by IPTC [IPTCTimeUnits]. For example:

<remoteContent duration="120" durationunit="timeunit:seconds" 
               href="http://example.com/video2.mp4"/>

This fragment states that the content at http://example.com/video2.mp4 lasts 120 seconds (in this example, we suppose that timeunit is a scheme alias for the controlled vocabulary defined by IPTC for time units).

Possible time units are given in the table below, where the "Concept URI" column gives the concept URI to which the QCode provided by durationunit resolves.

Time units for video or animated graphic duration
Unit Concept URI
Second http://cv.iptc.org/newscodes/timeunit/seconds
Millisecond http://cv.iptc.org/newscodes/timeunit/milliseconds

If a duration attribute is present without a durationunit attribute, then you must assume that the duration is expressed in seconds.

Icon (aka illustration or preview image)

Basic format

Video and animated graphic documents: icon renditions may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <!-- A visual item with two icons -->
        <newsItem>
            <contentMeta>
                <icon href="http://example.com/img1.jpg"/>
                <icon href="icons/img2.tiff"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: icon renditions may be provided in the content meta of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <!-- A video or animated graphic item with two icon renditions -->
        <newsItem>
            <contentMeta>
                <icon href="http://example.com/img1.jpg"/>
                <icon href="icons/img2.tiff"/>
            </contentMeta>
        </newsItem>
        <!-- A video or animated graphic item with one icon rendition -->
        <newsItem>
            <contentMeta>
                <icon href="ftp://example.com/img3.jpg"/>
            </contentMeta>
        </newsItem>        
    </itemSet>
</newsMessage>

An icon is an image illustrating a video or an animated graphic (in NewsML-G2, an icon can also be associated with pictures or still graphics, but AFP documents do not use this feature). An icon is typically a keyframe of the visual content, but it can also be a logo or any other illustration.

Each video or animated graphic document, and each video or animated graphic item of a multimedia document may have at most one logical visual content as its icon. However, this content may be available in multiple renditions (e.g., low resolution, high resolution, JPEG format, TIFF format, etc.). Each rendition is described by an icon element in the content metadata section the news item.

As stated by the NewsML-G2 documentation:
Each [icon] rendition [in the content metadata section of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format.

Links to the actual icon renditions are provided by href attributes of icon elements. The value of each href attribute is an URI reference (while NewsML-G2 allows for IRI references, AFP systems only output URI references). See section "Accessing visual content through URI references" for additional directions on how to use these links.

For each icon rendition, some information might be provided by attributes on icon elements. These attributes are described below.

Icon rendition type

To aid selecting icon renditions, the type of a rendition may be provided by a rendition attribute in the icon element describing the rendition, as in this example:

<!-- Two description of icon renditions of different types -->
<icon rendition="rnd:lowRes"  href="icons/img1.jpg"/>
<icon rendition="rnd:highRes" href="icons/img2.tiff"/>

The rendition attribute provides a QCode whose possible values are taken from a controlled vocabulary provided by the IPTC and from a controlled vocabulary provided by AFP. They are described in the table below, where the "Concept URI" column gives the URI the QCode resolves to.

Rendition types
Concept URI Description
http://cv.iptc.org/newscodes/rendition/thumbnail A very small rendition of an image, giving only a general idea of its content
http://cv.iptc.org/newscodes/rendition/preview Preview resolution image
http://cv.iptc.org/newscodes/rendition/lowRes Low resolution image
http://cv.iptc.org/newscodes/rendition/highRes High resolution image
http://cv.iptc.org/newscodes/rendition/print Content intended to appear in print
http://cv.iptc.org/newscodes/rendition/web Content intended to appear on a web page
http://cv.iptc.org/newscodes/rendition/mobile Content intended to appear on a mobile or handheld device
http://cv.afp.com/renditions/ipad Content intended to appear on iPad
http://cv.afp.com/renditions/squaredThumbnail A small squared rendition of an image
http://cv.afp.com/renditions/fullSize Documentation forthcoming
http://cv.afp.com/renditions/highDef Documentation forthcoming

Media type and format

The media type of an icon rendition may be provided by a contenttype attribute on the icon element describing the rendition, as in this example:

<!-- Two description of icon renditions of different types -->
<icon contenttype="image/jpeg" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" href="icons/img2.tiff"/>

The value of the contenttype attribute is a IANA MIME media type name MediaTypes].

The contenttype attribute may be complemented by a format attribute to refine information about the data format of the icon rendition. For example:

<!-- Two descriptionss of icon renditions,
     each one with a media type complemented by a format -->
<icon contenttype="image/jpeg" format="frmt:JPEG_Baseline" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" format="frmt:NSK-TIFF"      href="icons/img2.tiff"/>

The format attribute provides a QCode whose possible values are in the controlled vocabulary for formats defined by IPTC [IPTCFormats].

Visual dimensions

The width and height of an icon rendition may be provided by width and height attributes (whose values are non-negative integers) on the icon element describing the rendition. The units for these dimensions are then provided by widthunit and heightunit attributes. These attributes provide QCodes whose possible values are in a subset of the controlled vocabulary for dimension units defined by IPTC [IPTCDimUnits]. For example:

<icon width ="640" widthunit ="dimensionunit:pixels" 
      height="400" heightunit="dimensionunit:pixels" href="icons/img1.jpeg"/>

This fragment states that the visual content at icons/image1.tiff is 640 pixels width and 400 pixels height (in this example, we suppose that dimensionunit is a scheme alias for the controlled vocabulary defined by IPTC for dimension units).

The possible dimension units are a subset of the IPTC dimension units controlled vocabulary. They are provided in the table below, where the "Concept URI" column gives the URI to which the heightunit and/or widthunit attributes resolves.

Dimension units
Unit Concept URI
Pixels http://cv.iptc.org/newscodes/dimensionunit/pixels
Points http://cv.iptc.org/newscodes/dimensionunit/points

If a width and/or a height attribute is present but the corresponding dimension unit attribute is missing, then you must assume that the width and/or height is expressed in the default unit for that dimension. The default dimension units, which are specified by NewsML-G2, are given in the table below.

Default dimension units
Type of visual content Default height unit Default width unit
Picture pixels pixels
Graphic (still or animated) points points
Digital video pixels pixels

Size

The size in bytes of an icon rendition may be provided by a size attribute on the icon element describing the rendition, as in this example:

<icon size="253476" href="icons/img1.jpeg"/>

In this example, the size attribute asserts that the representation of the resource identified by icons/image1.tiff weight 253476 bytes.

The value of the size attribute is a non-negative integer.

Script (aka verbatim or transcript)

Video and animated graphic documents: a script may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="QCode resolving to http://cv.afp.com/descriptionRoles/script">
                    A rare glimpse of the art behind the label. 
                    What Yves Saint Laurent earned in the fashion industry he spent on 
                    masterpieces.At Christie’s auction house in London, a treasure trove of
                    paintings, sculpture, furniture and jewellery amassed by the fashion 
                    icon and his lover and business partner Pierre Bergé -- over a 50 year 
                    partnership.

                    SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department, 
                    Christie’s Europe [English, 13 sec]:
                    "It's unprecedented - I mean we've never sold a collection in recent 
                    memory of that sort of outstanding quality throughout and I think it's
                    going to be most welcome by collectors who don't have that often a 
                    chance to acquire pieces of such quality"

                    Following the death of Yves Saint Laurent last year, Bergé chose to sell
                    the couple’s entire collection, which adorned their apartments in Paris.

                    For him, the sale is about finding some degree of closure: 

                    SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house 
                    [French, 16 sec]: "C’est le jour ou le dernier objet sera passé sous le 
                    marteau d'un commissaire priseur que à mon sens – a mon sens - cette 
                    collection pourra écrire le mot fin."

                    "Only on the day that the last piece goes under the hammer of an 
                    auctioneer – in my view – will the last word of this collection be 
                    written"

                    In spite of the global economic slowdown, Christie’s hopes the 
                    collection will fetch around 400 million dollars when it goes up for 
                    sale in Paris at the end of February.

                    A cubist-era Picasso – valued at 40 million dollars – and a rare 
                    selection of Mondrians are among the highlights. But for Yves Saint
                    Laurent and Pierre Bergé, it was not about the price tags – more the
                    enjoyment of living amongst beautiful art.

                    SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas 
                    [English, 19 sec]: "There was a great sense of everything being in the
                    right place - nothing dominating -and no trophies. I think it is a 
                    collection that's formed by two incredibly intelligent people working 
                    completely in concert with eachother - that's very unusual."

                    But it’s an unusual bond that is soon to be broken up amongst 
                    collectors, dealers and museums – the end of a long reign for 
                    the king of fashion.
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a script may be provided in the content metadata section of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A script for the content of this item -->
                <description role="QCode resolving to http://cv.afp.com/descriptionRoles/script">
                    A rare glimpse of the art behind the label. 
                    What Yves Saint Laurent earned in the fashion industry he spent on 
                    masterpieces.At Christie’s auction house in London, a treasure trove of
                    paintings, sculpture, furniture and jewellery amassed by the fashion 
                    icon and his lover and business partner Pierre Bergé -- over a 50 year 
                    partnership.

                    SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department, 
                    Christie’s Europe [English, 13 sec]:
                    "It's unprecedented - I mean we've never sold a collection in recent 
                    memory of that sort of outstanding quality throughout and I think it's
                    going to be most welcome by collectors who don't have that often a 
                    chance to acquire pieces of such quality"
                    ...
                    ...
                 </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- A script for the content of this item -->
                <description role="QCode resolving to http://cv.afp.com/descriptionRoles/script">
                    Hundreds of art buyers and lovers from around the world came for the
                    biggest private collection ever up for auction.
                    
                    SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
                    "I arrived two days ago to attend the sale." 

                    SOUNDBITE 2: Vox pop (man) (English, 4 sec)
                    "I came especially for the exhibition. Going back to New York very
                    shortly."
                    ...
                    ...
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A script, if present, provides the transcript of voices that can be heard in the video. It may also contains indications of significant sounds (e.g., "the sound of an explosion"). These elements are provided in their order of occurrence in the video or animated graphic.

A script is provided by a description element whose role attribute, a QCode, resolves to http://cv.afp.com/descriptionRoles/script. It may appear at most once per item.

Note that in some documents, the content of a description element whose role attribute resolves to http://cv.afp.com/descriptionRoles/script isn't a voice/sound transcript or isn't only a voice/sound transcript:

Shot lists have their dedicated slots in this XML format (see section "Shot list"), but in some documents they appear in the slots for scripts. For example, here is a description element that contains both a script an a shot list (we show only partial content):

<description role="QCode resolving to http://cv.afp.com/descriptionRoles/script">
    Script:
    Hundreds of art buyers and lovers from around the world came for the biggest 
    private collection ever up for auction.
    
    SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
    "I arrived two days ago to attend the sale."
    ...
    ...
    
    Shotlist: (shot Feb 23, 2009)
    -wide of auctioneer
    -painting on screen
    -Berge arriving at auction
    -SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
    -SOUNDBITE 2: Vox pop (man) (English, 4 sec)
    -close up of Matisse
    ...
    ...
</description>

Note that while NewsML-G2 allows for rich text by using some markup in the content of a script, AFP's systems only output simple textual content not interspersed with markup.

Shot list

Video and animated graphic documents: a shot list may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="QCode resolving to 
                                   http://cv.afp.com/descriptionRoles/shotList">
                    -Member of Christie's staff walking in front of paintings
                    -Photographers
                    -Tilt of YSL poster
                    -VAR Christie's member of staff with metal art works
                    -VAR Theodore Gericault painting
                    -Thomas Seydoux, International Co-Head of Department, Christie’s Europe 
                    -PAN of photo of YSL's flat in Paris
                    -SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house
                    -Paintings on wall
                    -VAR Ferdinand Leger painting
                    -Picasso painting
                    -Woman looking at painting
                    -VAR Frans Hals portrait
                    -SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas 
                    -People walking through gallery
                    -Tilt to poster of YSL
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a shot list may be provided in the content metadata section of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A shot list for the content of this item -->
                <description role="QCode resolving to 
                                   http://cv.afp.com/descriptionRoles/shotList">
                    -Member of Christie's staff walking in front of paintings
                    -Photographers
                    -Tilt of YSL poster
                    ...
                    ...
               </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- A shot list for the content of this item -->
                <description role="QCode resolving to 
                                   http://cv.afp.com/descriptionRoles/shotList">
                    -wide of auctioneer
                    -painting on screen
                    -Berge arriving at auction
                    -SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
                    -SOUNDBITE 2: Vox pop (man) (English, 4 sec)
                    -close up of Matisse
                    ...
                    ...
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A shot list, if present, provides a concise description of each sequence. These elements are provided in their order of occurrence in the video or animated graphic.

A shot list is provided by a description element whose role attribute, a QCode, resolves to http://cv.afp.com/descriptionRoles/shotList. It may appear there at most once per item.

In some documents, the shot list isn't provided in this way but appear concatenated to the script (see section "Script" for an example).

The exact format of a shot list may not be the same for all kind of documents and may also vary according to local journalistic practices.

Note that while NewsML-G2 allows for rich text by using some markup in the content of a shot list, AFP's systems only output simple textual content not interspersed with markup.

Visible speakers

Video and animated graphic documents: visible speakers may be described in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="QCode resolving to http://cv.afp.com/descriptionRoles/synthe">
                    -Thomas Seydoux (man), International Co-Head of Department,
                     Christie’s  Europe 
                    -Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
                    -Jonathan Rendell (man), Deputy Chairman, Christie’s Americas                
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: visible speakers may be described in the content metadata section of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- Visible speakers for the content of this item -->
                <description role="QCode resolving to http://cv.afp.com/descriptionRoles/synthe">
                    -Thomas Seydoux (man), International Co-Head of Department,
                     Christie’s  Europe 
                    -Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
                    -Jonathan Rendell (man), Deputy Chairman, Christie’s Americas                
               </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- Visible speakers for the content of this item -->
                <description role="QCode resolving to http://cv.afp.com/descriptionRoles/synthe">
                    -Vox pop woman
                    -Vox pop man
                    -Pierre Berge (man), Yves Saint Laurent's partner
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Information relevant to the visible people speaking in the video may be provided by a description element whose role attribute, a QCode, resolves to http://cv.afp.com/descriptionRoles/synthe. It may appear at most once per item. This information is provided in the order of occurrence of visible speakers in the video or animated graphic.

This information typically includes speakers' name and function. It can be used, for example, to add captions accompanying speakers' appearances in the video.

Note that while NewsML-G2 allows for rich text by using some markup in the content of visible speakers descriptions, AFP's systems only output simple textual content not interspersed with markup.

Data specific to multimedia documents

Some data is specific to multimedia documents. This section details these data elements.

Number of non-main items by nature

Multimedia documents: the number of non-main items broken down by item natures may be provided in the item metadata section of the main news item.

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
                <afp:extension>
                    <afp:stats>
                        <afp:totalComponentsOfType qcode="ninat:graphic" total="1" />
                        <afp:totalComponentsOfType qcode="ninat:picture" total="3" />
                    </afp:stats>
                </afp:extension>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

As shown above each totalComponentsOfType element provides the number of non-main items of a given nature present in the document. The qcode attribute specifies the nature through a QCode whose corresponding concept URI can be one of those specified in the table at the end of section "Document types" with the exception of the concept URI for text content, as multimedia documents don't contain non-main text items. The total attribute provides the number of items of the given nature, as a strictly positive integer. If the stats element is present, the absence of a totalComponentsOfType element for a given nature means that no non-main item of that nature is present in the document.

The totalComponentsOfType elements appears inside a stats element inside an extension element in the item metadata section of the main news item. Note that the totalComponentsOfType, stats and extension elements are not standard NewsML-G2 vocabulary but part of an AFP's specific extension. They are defined in an XML namespace whose name is "http://www.afp.com/format/internal/".

In the example above, if "ninat" is an alias for the scheme http://cv.iptc.org/newscodes/ninature/:

The extension and stats elements are optional (i.e., they may or may not present). When they are present they appear at most once per document.

Textual content

Multimedia documents: the textual content is provided in the content set of the main news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="QCode resolving to http://cv.iptc.org/newscodes/conceptrelation/isA"
                      href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentSet>
                <inlineXML contenttype="application/xhtml+xml">
                    <html xmlns="http://www.w3.org/1999/xhtml">
                        <head>
                            <title>
                                YSL-Bergé collection sets new world record at auction 
                                for a private collection
                            </title>
                        </head>
                        <body>
                            <p>
                                The Yves Saint Laurent and Pierre Bergé collection sets 
                                new world record at auction for a private collection. 
                                Hundreds of art treasures amassed by late fashion designer
                                Yves Saint Laurent and his companion Pierre Berge over half
                                a century are being auctioned.
                            </p>
                            <p>
                                <!-- A picture or still graphic item -->
                                <span class="g2item">
                                    <a href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"/>
                                    <img src="pictureItem/image1.jpeg" style="float: left;" 
                                         generator-unable-to-provide-required-alt="" height="163" width="245"   />
                                </span>
                            </p>    
                            <p>
                                Bids hit 206 million euros (261 million dollars) on February
                                23, 2009 making it the biggest private collection ever 
                                auctioned with two days of sales still left to run.
                            </p>
                            <p>
                                <!--  A video item -->
                                <span class="g2item">
                                    <a href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"/>
                                    <video style="float: right;" controls="controls" height="138" width="245"
                                           poster="videoItem/image1.jpeg">
                                        <source src="videoItem/video1.mp4" type="video/mp4" />
                                        <source src="videoItem/video2.mp4" type="video/mp4" />
                                    </video>
                                </span>
                            </p>
                            <p>
                                <!-- An hypertext link to an external resource -->
                                The <a href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a> claims that ...
                            </p>
                        </body>
                    </html>
                </inlineXML>
            </contentSet>
        </newsItem>
        <newsItem guid="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2">
        </newsItem>
        <newsItem guid="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052">
        </newsItem>
    </itemSet>
</newsMessage>

The textual content of the document is the main journalistic text of the document. It is provided by an inlineXML element. It is expressed in XHTML. A contentType attribute with a value of application/xhtml+xml explicitly denotes the presence of XHTML content.

The XHTML representation of the textual content may contain links to other items of the document. The position of such links gives an indication of where the content of such items (e.g., pictures, videos, etc.) may take place inside the textual content when displayed. As shown in the example above, these links are provided in span elements with a class attribute equal to "g2item". The first element of such a span is always an a element whose href attribute provides the guid of the linked item.

Additional XHTML markup providing data about a given item can also appear inside these spans. For example, a document may contains an img element displaying a rendition of the item:

<span class="g2item">
    <a href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"/>
    <img src="pictureItem/image1.jpeg" style="float: left;" 
        generator-unable-to-provide-required-alt="" height="163" width="245"/>
</span>

And here is another example, using a video element with an illustration image linked to with the poster attribute (additional attributes such as autoplay, loop, etc. may be used as well).

<span class="g2item">
    <a href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"/>
    <video style="float: right;" controls="controls" height="138" width="245"
           poster="videoItem/image1.jpeg">
        <source src="videoItem/video1.mp4" type="video/mp4" />
        <source src="videoItem/video2.mp4" type="video/mp4" />
    </video>
</span>

The XHTML can also contain hypertext links to external resources (i.e., links to entities that aren't logically part of the document) such as other NewsML-G2 documents, Web pages, etc. They may be provided by a elements. For example here is a link to a wikipedia page:

<a href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>

Accessing visual content through URI references

In a document, a number of elements provide links to actual visual content in formats such as JPEG, MPEG-4, etc. Some of these elements are defined by the NewsML-G2 format while others are defined by the XHTML format, as AFP text and multimedia documents can contain XHTML embedded right into NewsML-G2. For example, such links can be provided by:

A link of this type is an URI reference as defined by [RFC3986]. This means it is either an URI or a relative-ref (colloquially referred as "relative URI").

At some point when dealing with a NewsML-G2 document, you'll typically want to retrieve the actual visual content, in order to process or display it.

If the link is a (non relative) URI per [RFC3986], you can directly dereference it, using standard software components, to retrieve the actual visual content. Typically, the scheme(s) used for such URI depend(s) on the specific delivery architecture established between you and AFP. Examples of commonly used schemes are: http, ftp and cid.

If the link is a relative-ref, then you need to resolve it to its target URI. You can then dereference the target URI to retrieve the actual visual content.

Note that with most standard libraries providing URI reference resolution, resolving a (non-relative) URI is the identity operation. That way, you don't have to determine whether you have been handed an (non-relative) URI or a relative-ref: you can just resolve the URI reference and then dereference it to retrieve the actual visual content.

Section 5 of [RFC3986] defines the process of resolving an URI reference. To carry on this process, you need the URI reference itself (as stated earlier, it is provided in the document, for example in an href attribute, src attribute, etc.) and a base URI. Typically the base URI is the URI that allows retrieving the NewsML-G2 document.

For example, if AFP delivers you a package that contains both an AFP NewsML-G2 document and data files for the associated visual content, the base URI is the URI that allows accessing the NewsML-G2 document after delivery. Suppose AFP provides you with a file archive (say, "document12345WithRenditions.tgz") that you unarchive in your file system at "/deliverySpace/internet-journal/topnews/", producing the following file structure (directories names are in blue and filenames for renditions are in gray):

Sample delivery structure

In this context, the base URI is the URI that allows accessing the NewsML-G2 document after delivery. If your NewsML-G2 processor accesses the NewsML-G2 document at file:///deliverySpace/internet-journal/topnews/document12345WithRenditions/document12345.newsmlg2, then this is the base URI. The URI references linking to the visual content can be resolved relatively to this base URI. For example, the URI reference photoItem/image1.jpg would resolve to file:///deliverySpace/internet-journal/topnews/document12345WithRenditions/photoItem/image1.jpg, which can then be dereferenced to access that particular visual content.

Several libraries provide URI reference resolution. For instance, in Java, one could use the resolve() method of the java.net.URI class.

References

[G2Impg] "NewsML-G2 Guide for implementers". IPTC. Available from http://www.iptc.org/site/News_Exchange_Formats/NewsML-G2/Specification/
[G2Spec] NewsML-G2 specification at power conformance level. IPTC. Available from http://www.iptc.org/site/News_Exchange_Formats/NewsML-G2/Specification/
[MediaTypes] MIME Media Types. Available at http://www.iana.org/assignments/media-types/index.html
[IPTCCPNatures] The IPTC controlled vocabulary for basic natures of concepts. Available at http://cv.iptc.org/newscodes/cpnature/
[IPTCDimUnits] The IPTC controlled vocabulary for dimension units. Available at http://cv.iptc.org/newscodes/dimensionunit/
[IPTCFormats] The IPTC controlled vocabulary for rendition formats. Available at http://cv.iptc.org/newscodes/format/
[IPTCGenres] The IPTC controlled vocabulary for genres. Available at http://cv.iptc.org/newscodes/genre/
[IPTCLocTypes] The IPTC controlled vocabulary for location types. Available at http://cv.iptc.org/newscodes/location/
[IPTCMediaTopics] The IPTC controlled vocabulary for media topics. Available at http://cv.iptc.org/newscodes/mediatopic/
[IPTCNProviders] The IPTC controlled vocabulary for news providers. Available at http://cv.iptc.org/newscodes/newsprovider/
[IPTCTimeUnits] The IPTC controlled vocabulary for time units. Available at http://cv.iptc.org/newscodes/timeunit/
[IPTCCWarn] The IPTC controlled vocabulary for content warnings. Available at http://cv.iptc.org/newscodes/contentwarning/
[ISO3166] ISO 3166 Maintenance Agency. Available at http://www.iso.org/iso/country_codes.htm
[HTTPURI] "RFC 2616, section 3.2: Uniform Resource Identifiers". R. Fielding & al. June 1999. Available at http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2
[RFC3085bis] "URN Namespace for news-related resources". M. Steidl and J. Lorenzen. July 2009. Draft available at http://tools.ietf.org/html/draft-steidl-newsml-urn-rfc3085bis-00
[RFC3986] "Uniform Resource Identifier (URI): Generic Syntax". T. Berners-Lee, R. Fielding and L. Masinter. January 2005. Available at http://tools.ietf.org/html/rfc3986
[RFC3987] "Internationalized Resource Identifiers (IRIs)". M. Duerst and M. Suignard. January 2005. Available at http://www.ietf.org/rfc/rfc3987
[RFC5646] "Tags for Identifying Languages". A. Phillips and M. Davis. September 2009. Available at http://tools.ietf.org/html/rfc5646
[RFC5870] "A Uniform Resource Identifier for Geographic Locations ('geo' URI)". A. Mayrhofer and C. Spanring. June 2010. Available at http://tools.ietf.org/html/rfc5870
[TagCloud] Wikipedia article on tag cloud. Available at http://en.wikipedia.org/wiki/Tag_Cloud
[XMLSchemaDataTypes] XML Schema Part 2: Datatypes. Available at http://www.w3.org/TR/xmlschema-2/
[XMLSpec] "Extensible Markup Language (XML) 1.0". Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau. Available at http://www.w3.org/TR/xml/

Document history

Version Date By Notes
1.0 January 2012 Philippe Mougin Initial version
1.1 February 2014 Philippe Mougin Documentation updated thoroughly in preparation of public deliveries of NewsML-G2 documents

Copyright © 2012-2014 AFP