Technical guide to AFP NewsML-G2

Updated August 2021


Table of Contents

Introduction

AFP delivers information in a number of ways, tailored to its clients needs. One delivery vector is NewsML-G2, an industry-driven format and processing model allowing rich machine-readable representation of news content.

This document is your technical guide to AFP NewsML-G2 documents. You'll make use of it when implementing systems that receive and process AFP NewsML-G2 documents. It describes how building blocks defined by NewsML-G2 are combined in AFP documents to convey news content and associated metadata (titles, genres, subjects, embargo, etc.). It should be used along the NewsML-G2 documentation provided by IPTC [G2Doc], which it assumes knowledge of.

AFP NewsML-G2 documents build upon the NewsML-G2 format and processing model defined by IPTC (International Press Telecommunications Council) in the context of the NAR (News Architecture). NewsML-G2 is itself an application of XML and makes use of XML Schema. AFP NewsML-G2 documents also make use of the XML syntax of HTML [HTMLSpec] (formerly referred to as "XHTML 5") to represent textual content along with rich structural information as chunks of HTML in XML syntax can be embedded right into NewsML-G2 content. In order to deal with AFP NewsML-G2 documents, you will make use of all these technologies.

2016-04-01 13:02ZTechnology stackTechnology stackXML and XML SchemaNewsML-G2HTMLAFP NewsML-G2 document format

Technology stack

Further sections provide an overview of AFP documents structure. They describe the information a document conveys and how to it.

Mandatory processing

NewsML-G2 documents convey a number of metadata. For the most part you can pick some and ignore others as you see fit. For example, you can make use of IPTC media topics, or opt to not rely on it. Some metadata, however, cannot be ignored and must be processed, such as embargo instructions.
It is possible that, in addition to your NewsML-G2 integration, your workflows process AFP's metadata delivered by other means. For example, for video production we also deliver metadata in the form of a human readable dopesheet sent by email. In the end, these metadata must be correctly processed, be they obtained from NewsML-G2 or from another delivery medium.

Correctly processing the following metadata is mandatory:

Should you encounter questions or difficulties when implementing these mandatory processes, please contact your AFP representative to get assistance.

Undocumented features

In actual NewsML-G2 documents delivered by AFP you will find several things neither documented here nor in the NewsML-G2 specification, such as undocumented XML elements and attributes. You must not rely on these undocumented features, unless specifically advised to do so by your AFP representative. These undocumented features are prone to change without notice and contain information that you cannot interpret reliably.

Overview

An AFP NewsML-G2 document provides metadata about content published by AFP. For example, when AFP publishes a picture it also publishes an associated NewsML-G2 document that provides metadata about this picture such as a caption, the name of the photographer, the location of the event, etc. Depending on the nature of the content, the NewsML-G2 document can be separate from the main content itself (e.g., a picture as a JPEG file along with a NewsML-G2 document) or can contain the main content (e.g., a textual story embedded inside the NewsML-G2 document).

There are eight main types of AFP news content for which NewsML-G2 can play a role:

The type of a NewsML-G2 document defines important characteristics of the document such as the nature of its content, its XML structure, the metadata it provides as well as some elements of its processing model. As you can see the type of an NewsML-G2 document is named after the type of news content the NewsML-G2 document is associated with.

AFP NewsML-G2 documents of type text, picture, video, still graphic, animated graphic and interactive graphic have the same top-level structure: a NewsML-G2 element called "news message". This news message is an envelope that contains one "news item". This news item represents some news content which can be either a news story in textual form, a photo, a video, a still graphic, an animated graphic or an interactive graphic.

AFP NewsML-G2 documents of type multimedia also have a news message as the top-level structure. This news message is an envelope that contains one or more news item(s): a main item with the multimedia content in the XML syntax of HTML and additional items for photos, videos, etc.

AFP NewsML-G2 documents of type live report index also have a news message as the top-level structure. This news message is an envelope that contains a "package item" providing metadata about the live report as a whole and links to NewsML-G2 documents representing the individual posts of the live report.

Section "Type of document" describes how to determine the type of a document. The following sections provide an overview of the structure of documents.

Text documents

Text documents have only one news item. This item contains metadata and textual news content. The content is represented by some HTML (in its XML) syntax embedded right into the news item.

2016-03-30 09:36ZTextTextNews messageHeaderDate of transmission and identifiers of the AFP products this document belongs to. May also contain optional informations about the message or the transmission processItem setNews itemMetadata (titles, genres, subjects, content warning, embargo information, etc.) + Textual news content (HTML)Contains one news item

Top-level structure of text documents

Picture and still graphic documents

Picture and still graphic documents have only one news item that conveys only one logical visual content (e.g., one photo). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the picture or still graphic, the news item contains links to the actual visual content (e.g., JPEG resources) for each rendition. The visual content for each rendition isn't provided in the NewsML-G2 document itself, but by external resources (e.g., accompanying files, Web resources, etc.).

2016-03-30 09:36ZPicture and still graphicPicture and still graphicNews messageHeaderDate of transmission and identifiers of the AFP products this document belongs to. May also contain optional informations about the message or the transmission processItem setNews itemMetadata (titles, genres, subjects, content warning, embargo information, caption, etc.) + Links to renditions of visual contentContains one news itemExternal resourcesthumbnail.jpglow-res.jpghigh-res.jpg

Top level structure of picture and still graphic documents (example)

Video and animated graphic documents

Video and animated graphic documents have only one news item that conveys only one logical visual content (e.g., one video, one animated graphic). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the video or animated graphic, the news item contains links to the actual visual content (e.g., MPEG resources) for each rendition. The visual content for each rendition isn't provided in the NewsML-G2 document itself, but by external resources (e.g., accompanying files, Web resources, etc.).

The the news item may also contains links to renditions of an icon (aka "illustration" or "preview image"). The renditions of the icon aren't provided in the NewsML-G2 document itself, but in external resources (e.g., accompanying files, Web resources, etc.).

2016-03-30 09:36ZVideo and annimatedVideo and animated graphicNews messageHeaderDate of transmission and identifiers of the AFP products this document belongs to. May also contain optional informations about the message or the transmission processItem setNews itemMetadata (titles, genres, subjects, content warning, embargo information, script, transcription etc.) + Links to renditions of an icon (i.e., illustration image)+Links to renditions of video or animated graphic contentContains one news itemExternal resourcesillustration-low-res.jpgillustration-high-res.jpgvideo-low-res.mp4video-high-res.mp4video-high-res.wmv

Top-level structure of video and animated graphic documents (example)

Multimedia documents

Multimedia documents have one or multiples news items. One of these items is the "main news item". It is always present and provides the multimedia content using the XML syntax of HTML. Tt also provides metadata about the document, much like the news item of a text document. It also contains links to other items of the document. These additional items convey information about visual content: pictures, videos or graphics. They are much like the items found in picture, video or graphic documents.

The figure below provides an example of multimedia document with one main item, a picture item and a video item.

2016-08-30 14:14ZMultimédiaMultimediaNews messageHeaderDate of transmission and identifiers of the AFP products this document belongs to. May also contain optional informations about the message or the transmission processItem setNews itemMetadata (titles, genres, subjects, content warning, embargo information, script, transcription, etc.) + Links to renditions of an icon (i.e., illustration image)+Links to renditions of video or animated graphic contentContains one or multiple news item(s)illustration-low-res.jpgillustration-high-res.jpgvideo-low-res.mp4video-high-res.mp4video-high-res.wmvMain news itemMetadata (titles, genres, subjects, content warning, embargo information, etc.) + Multimedia news content (HTML). Contains textual content intermingled with embedded audiovisual content along with links to associated news items.News itemMetadata (titles, genres, subjects, content warning, embargo information, caption, etc.) + Links to renditions of visual contentExternal resourcesthumbnail.jpglow-res.jpghigh-res.jpg

Top-level structure of multimedia documents (example)

The main news item is identified by the presence of a specific element in its item metadata section: a link element whose rel attribute convey the concept URI http://cv.iptc.org/newscodes/conceptrelation/isA (using the QCode crel:isA) and whose href attribute, an URI, is equal to http://cv.afp.com/itemnatures/mmdMainComp.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <!-- This link element tells that this news item is the main item of the multimedia document  -->
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>                
            </itemMeta>
        </newsItem>
        
        <!-- Additional, non-main items  -->
        <newsItem></newsItem>
        <newsItem></newsItem>
     </itemSet>
</newsMessage>

You'll find more information about QCodes in section Controlled vocabularies and qualified codes.

Live reports

A live report is represented by multiple NewsML-G2 documents :

The figure below shows the top level structure of a live report. You can see the index on the left and the various posts on the right.

2016-08-18 13:06ZLive reportMultimediaNews messageHeaderDate of transmission and identifiers of the AFP products this document belongs to. May also contain optional informations about the message or the transmission processItem setPackage itemMetadata (title, subjects, keywords, content warning, urgency, etc.) + Links to live report posts. Each post is itself a NewsML-G2 multimedia documentContains one package itemd-oc1ku.xmlMultimedia documentd-oc02w.xmlMultimedia documentd-ob2p7.xmlMultimedia documentd-oa76n.xmlMultimedia documentd-oa76e.xmlMultimedia documentd-oa769.xmlMultimedia documentd-oa74e.xmlMultimedia documentd-oc02w.xmlMultimedia documentd-oa70r.xmlMultimedia documentd-oa70y.xmlMultimedia documentd-oa6zn.xmlMultimedia documentd-o26yk.xmlMultimedia document

Top-level structure of live reports (example)

Document walk-through

Below is an example of a simple text document, with just a few metadata and some textual content. Using this example, we will walk through some structural elements that are common to every type of AFP NewsML-G2 documents.

Note that while the XML in this example is formatted to ease reading, actual document you will receive will usually be in a compact form (e.g., all XML on one line).

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
   <header>
      <sent>2009-02-23T20:44:07+02:00</sent>
   </header>
   <itemSet>
      <newsItem standard="NewsML-G2" standardversion="2.28" conformance="power" 
                guid="http://doc.afp.com/863OC" version="3" xml:lang="en">
         <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml"/>
         <catalogRef href="http://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2-V2_4.xml"/>
         <itemMeta>
            <itemClass qcode="ninat:text"/>
            <provider qcode="nprov:AFP">
               <name>AFP </name>
            </provider>
            <versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
            <pubStatus qcode="stat:usable"/>
         </itemMeta>
         <contentMeta>
            <headline>
               YSL-Bergé collection sets new world record at auction 
               for a private collection
            </headline>
            <subject qcode="medtop:20000031" type="cpnat:abstract">
               <name>visual art</name>
            </subject>
            <subject qcode="medtop:20000011" type="cpnat:abstract">
               <name>fashion</name>
            </subject>
         </contentMeta>    
         <contentSet>
            <inlineXML contenttype="application/xhtml+xml" wordcount="70">
               <html xmlns="http://www.w3.org/1999/xhtml">
                  <head>
                     <title>
                        YSL-Bergé collection sets new world record at auction 
                        for a private collection
                     </title>
                  </head>
                  <body>
                     <p>The Yves Saint Laurent and Pierre Bergé collection sets 
                     new world record at auction for a private collection. 
                     Hundreds of art treasures amassed by late fashion designer
                     Yves Saint Laurent and his companion Pierre Berge over half
                     a century are being auctioned.</p>
                     <p>Bids hit 206 million euros (261 million dollars) on February
                     23, 2009 making it the biggest private collection ever 
                     auctioned with two days of sales still left to run.</p>
                  </body>
               </html>
            </inlineXML>
         </contentSet>
      <newsItem>
   </itemSet>
</newsMessage>

Some notes about this structure:

Controlled vocabularies and qualified codes

Concepts are identified with concept URIs

Documents make use of a number of controlled vocabularies (aka taxonomies) to convey information. In this section, we focus on a specific set of controlled vocabularies called "NewsML-G2 schemes".

A NewsML-G2 scheme associates unambiguous identifiers to "concepts". These identifiers take the form of URIs (Uniform Resources Identifiers [RFC3986]).

For example, in NewsML-G2 a document is usable, withheld or canceled; this is known as the "publishing status" :

These identifiers are called "concept URIs". Together, they form a controlled vocabulary. While they may look like dereferencable HTTP URLs, they do not need to be. Their main purpose is to unambiguously identify various concepts.

A document can contain a pubStatus element that conveys the concept URI identifying its publishing status. Therefore, when you receive a document, you can process this concept URI (e.g., compare it to the three possible values given above) to determine what is the publishing status of the document.

Concepts URIs might be represented by QCodes in NewsML-G2 documents

In NewsML-G2 documents, some concept URIs are not directly expressed using the URI syntax. Instead, they are conveyed as QCodes (short for "Qualified Codes"). A QCode is made of two parts separated by a colon. The leftmost part (before the leftmost colon) is called the scheme alias. The part on the right of the leftmost colon is called the code.

2016-03-30 13:33ZQCode structurestat:usableScheme aliasCode

QCode structure

In some ways, a QCode can be seen as a compressed form of concept URI (actually it is a bit more than that, as it also identifies the controlled vocabulary the concept URI is part of, but this is an advanced topic that we won't develop further in this documentation). Determining the concept URI a QCode stands for is called resolving the QCode. We'll describe how this operation is to be performed at the end of this section.

Why it is useful to resolve QCodes to concept URIs

When processing NewsML-G2 documents it is useful to resolve QCodes to concept URIs and then to work in terms of concept URIs because QCodes are not universally unambiguous identifiers whereas concept URIs are.

For example, in a given document the publishing status "usable" may be expressed by the following QCode: stat:usable (see it in situ in section Document walk-through). However, in another document the same status might be expressed by the QCode pst:usable. These two QCodes are different but resolve to the same concept URI: http://cv.iptc.org/newscodes/pubstatusg2/usable.

Furthermore, while it does not happen within AFP production, if you consider NewsML-G2 documents in general it is even possible for the QCode stat:usable to express the publishing status "usable" in a given document while expressing something completely different in another document. In that case the resolution process will correctly yield http://cv.iptc.org/newscodes/pubstatusg2/usable in the context of the first document and a different concept URI in the context of the second document.

Important design principle: QCode resolution shields you from QCode-level variations or accidental homonymies and gives you unambiguous identifiers to work with.

What to do if you can't implement QCode resolution with your tool chain

Depending on your tool chain, QCode resolution might be difficult to implement. For example standards XML tools such as XPath processors can't easily integrate QCode resolution. If you are in such situation you can bypass the QCode resolution step and work directly in terms of QCodes when dealing with AFP's production because we ensures that in our NewsML-G2 documents QCodes are unambiguous (e.g., in all AFP documents the QCode stat:usable will represent the publishing status "usable").

In this documentation we specify both concept URIs and QCodes wherever needed. Unless specified otherwise, for IPTC standardized NewsML-G2 schemes we use the IPTC recommended QCodes that you can lookup in the corresponding IPTC documentation: for example, if you navigate with your Web browser to the resource identified by the concept URI for the publishing status "usable" (you can do it by clicking on this link: http://cv.iptc.org/newscodes/pubstatusg2/usable) you'll see that the IPTC recommended QCode for this publishing status is stat:usable.

When possible, however, it is advised to resolves QCodes. It includes the following benefits:

How to perform QCode resolution

The resolution process is described precisely in the NewsML-G2 documentation ([G2Doc]). In short, it consists in resolving the scheme alias part of the QCode to a scheme URI using the catalog information provided in the document at the item level, and then to concatenate that scheme URI to the code part of the QCode. In our example, the QCode stat:usable has a scheme alias stat and a code usable. It is resolved to http://cv.iptc.org/newscodes/pubstatusg2/usable, because the catalog information of the enclosing news item contains the following element :

<scheme alias="stat" uri="http://cv.iptc.org/newscodes/pubstatusg2/"/>

This catalog information can appear inline in the item inside catalog elements, or in an external resource referenced by the item through a catalogRef element, as in the following example borrowed from the section Document walk-through:

<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml"/>

Resolving a QCode raises a concept URI that unambiguously identifies a given concept on a global scale. In our example, the concept identified by http://cv.iptc.org/newscodes/pubstatusg2/usable is: the publishing status "usable". In the context of NewsML-G2 schemes, two logically different concepts are never given the same concept URI, even in different systems managed by different organizations.

How to read the examples

The following sections of this document are dedicated to answer questions of the form "Where is data X in an AFP NewsML-G2 document (and how can I make use of it)?". For example: "Where is the title of the document?", "Where is the textual content?", "Where is the caption?", "Where is the visual content?" etc.

For each data, XML examples are provided. These examples aren't complete documents, though: they are high-level representations of the format, omitting many aspects and focusing on the data in question.

For instance, here is the example we provide for the "word count" metadata in text documents (the word count gives an estimation of size of the textual content):

<newsMessage>
    <itemSet>
        <newsItem>
            <contentSet>
                <inlineXML wordcount="450">
                </inlineXML>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

As you can see, this example omits many elements: contrast it with the example of a complete document provided in section Document walk-through. What you get from it, however, is a sense of where the word count information can be found and how it looks like.

Some examples contain XML comments. For example:

<!-- A subject represented by a QCode  -->
<subject qcode="medtop:20000273"/>

These comments won't appear in real documents, they are annotations specific to this documentation.

Common data

Some data may be present in most types of documents. For example, a creation date or content warning can appear in any document (text, picture, still graphic, animated graphic, video, multimedia, live report, ...). This section details these common data elements. Further sections details data associated with specific types of documents.

Creators & Contributors

Text, picture, still graphic and video documents: creators and contributors may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <creator role="afpcrrol:writer afpctrol:forbyline">
                    <name>
                        John Doe
                    </name>
                </creator>
                <contributor role="afpctrol:editor afpctrol:validator">
                    <name>
                        Jeanne Dupont
                    </name>
                </contributor>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: creators and contributors to the multimedia document as a whole may be provided in the content metadata section of the main news item. Creators and contributors specific to an individual item may be provided in the content metadata section of that item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- The creators and contributors to the multimedia document as a whole -->
                <creator role="afpcrrol:writer afpctrol:forbyline">
                    <name>
                        John Doe
                    </name>
                </creator>
                <contributor role="afpctrol:forbyline">
                    <name>
                        Jeanne Dupont
                    </name>
                </contributor>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- The creators and contributors specific to this item -->
                <creator role="afpcrrol:photographer afpctrol:forbyline">
                    <name>
                        Al Dente
                    </name>
                </creator>
                <contributor>
                    <name>
                        Annie Mall
                    </name>
                </contributor>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: creators may be provided in the content metadata section of the package item. Note that no contributors are provided in live report indexes.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <creator role="afpcrrol:writer afpctrol:forbyline">
                    <name>
                        John Doe
                    </name>
                </creator>
                <creator role="afpcrrol:writer">
                    <name>
                        Walter Melon
                    </name>
                </creator>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

Creators and contributors may be provided by creator and contributor elements. Creators are persons who created the document or parts of the documents. Contributors are persons who modified or enhanced the document or parts of the documents. There might be any number of creators and contributors per news item.

For each creator and contributor we provide a name in the name element and optionally a list of roles, in the form of a QCode list, in the role attribute. The table below presents some roles often used in AFP documents.

Creator and contributor roles
Role QCode Concept URI
Writer afpcrrol:writer http://cv.afp.com/creatorroles/writer
Photographer afpcrrol:photographer http://cv.afp.com/creatorroles/photographer
Graphic designer afpcrrol:graphicDesigner http://cv.afp.com/creatorroles/graphicDesigner
For byline afpctrol:forbyline http://cv.afp.com/contributorroles/forbyline

Important: The "for byline" role has a special meaning: the names of creators and contributors without this role must not be published. You may use them for internal purpose such as contacting the journalist for questions, but you must not display them publicly in association with the content of the document.

Content warning

Text, picture, still graphic, video and multimedia documents: a content warning may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <signal qcode="sig:cwarn"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: a content warning may be provided in the item metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <signal qcode="sig:cwarn"/>
            </itemMeta>
        </packageItem>
    </itemSet>
</newsMessage>

A document may includes a warning about its content when it might be perceived offensive. In such case, you'll typically want to review the content of the document in order to decide how to use it. This warning takes the form of a signal element with a QCode sig:cwarn resolving to http://cv.iptc.org/newscodes/signal/cwarn.

When a content warning is present, we often provide a set of exclAudience elements that convey the reason(s) for the content warning. For example, in a document whose content contains potentially offensive violence and language:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <signal qcode="sig:cwarn"/>
            </itemMeta>
            <contentMeta>
                <exclAudience qcode="cwarn:violence"/>
                <exclAudience qcode="cwarn:language"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In a live report index, the exclAudience elements are provided in the package item instead of in a news item:

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <signal qcode="sig:cwarn"/>
            </itemMeta>
            <contentMeta>
                <exclAudience qcode="cwarn:violence"/>
                <exclAudience qcode="cwarn:language"/>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

Used in this way, each exclAudience element identifies an audience that may be offended or distressed by a given characteristic of the content (e.g. "violence"). In order to specify these, the IPTC's content warnings vocabulary [IPTCCWarn] must be used.

At the time of this writing we make use of the following content warnings, using the standard IPTC scheme: death, language, nudity, sexuality, violence and suffering.

Correction signal

Text, picture, still graphic, video and multimedia documents: a correction signal may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
   <itemSet>
      <newsItem>
         <itemMeta>
            <signal qcode="sig:correction"/>
         </itemMeta>
      </newsItem>
   </itemSet>
</newsMessage>

One particular type of update that can occur on a document is a correction. A correction occurs when an error has been found in a document and a corrected version is published. In such case, you receive a new version of the document (i.e., a document with the same guid an a new version number) that contains a correction signal. This signal takes the form of a signal element with a qcode attribute sig:correction resolving to http://cv.iptc.org/newscodes/signal/correction.

Common practice at AFP is to use this mechanism only for corrections of great significance. For example, the correction of a typo that doesn't change the meaning of the news story shall not be marked as a correction but might be issued as a mere update.

When a serious error is found with a key information in a document, which renders it unusable as such, it will usually be canceled instead of corrected. A document is canceled by issuing a version with the "canceled" publishing status, as discussed in section Publishing Status.

The correction signal doesn't provide details about the correction (e.g., what or where was the error, how it has been corrected). Such details will usually be provided in the general editorial note, which is given by an edNote element with a role attribute afpnoteRole:client resolving to http://cv.afp.com/ednoteroles/client (see the section on the general editorial note). For example:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <edNote role="afpnoteRole:client">
                   CORRECTS the first sentence of the answer of the auctioneer, which was incorrectly translated.
                </edNote>
                <signal qcode="sig:correction"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Handling a correction correctly is of paramount importance and can be a complex process (you probably have it in place already). For example, you may want to have someone review the item, along with its previous versions and the editorial note, to understand the error. You may then ensure that this correction is applied to any published material that carries the original error. This may include making sure that recipients of such material are notified and provided with the corrected information.

Dates

Two dates formats are used in this specification:

In addition to the description provided below, you should refer to the NewsML-G2 specification for information on the processing model for these dates.

Document transmission date

All documents: the transmission date of the document is provided in the header of the news message.

<newsMessage>
    <header>
        <sent>2009-02-23T20:44:07+02:00</sent>
    </header>
</newsMessage>

The transmission date is provided by the sent element. It is always present and uses the full date and time format. The transmission date indicates when the document was transmitted from AFP to your system.

Document creation date

Text, picture, still graphic, video and multimedia documents: the creation date of the NewsML-G2 document may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the creation date of the NewsML-G2 document may be provided in the item metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
            </itemMeta>
        </packageItem>
    </itemSet>
</newsMessage>

If present, the creation date of the NewsML-G2 document is provided by a firstCreated element in the full date and time format. This creation date specifies when the NewsML-G2 document was created (contrast this with the content creation date, which specifies when some content was created; e.g., when a given photo was shot). When a new version of the document is emitted, the creation date of the document isn't modified, but the version creation date is.

Document version creation date

Text, picture, still graphic, video and multimedia documents: the creation date of this version of the NewsML-G2 document is provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the creation date of this version of the NewsML-G2 document is provided in the item metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
            </itemMeta>
        </packageItem>
    </itemSet>
</newsMessage>

The creation date of this version of the NewsML-G2 document is provided by a versionCreated element in the full date and time format. This date information is always present in documents.

Content creation date

The content creation date is the date of creation of the main journalistic content associated with the NewsML-G2 document. For a photo, this is the date of shooting, except for a photo combo where we provide the date at which the combo was produced. Likewise, for live video footage, this is a date at which the covered event was occurring. For other type of content (e.g., video report, graphic) this is typically the date at which the content was produced.

Picture, still graphic, animated graphic and video documents

The content creation date may be provided by a contentCreated element in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>
Multimedia documents

The creation date of a specific picture, still graphic or video component may be provided in the content metadata section of the corresponding item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- This is the content creation date for this item -->
                <contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This is the content creation date for this other item -->
                <contentCreated>2009-02-22</contentCreated>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

While content creation dates may be provided for components, none is provided for the multimedia document itself. The version creation date of the document often provides a good approximation. However this might not be the case for all documents so you should adopt this heuristic approach only if your usage of this date can support a "right most of the time" situation.

Text documents

As with multimedia documents, no content creation date is provided. The version creation date of the document often provides a good approximation. However this might not be the case for all documents so you should adopt this heuristic approach only if your usage of this date can support a "right most of the time" situation.

Live report indexes

No content creation date is provided for live report indexes.

Embargo

Text, picture, still graphic, video and multimedia documents: embargo information is provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <embargoed/>
                <edNote role="afpnoteRole:embargo">
                    Embargoed until end of first auction day
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Embargo information is specified through the embargoed element, which can be completed by an edNote element with a role attribute afpnoteRole:embargo resolving to http://cv.afp.com/ednoteroles/embargo.

Embargo-wise, an AFP document can have one of the three statuses described in the table below.

Embargo statuses
Embargoed Representation Example
No No embargoed element. N/A
Until given date and time An embargoed element providing the date and time at which the embargo ends.
<embargoed>
    2009-02-23T21:00:00+02:00
</embargoed>
Under other provided conditions An empty embargoed element and an embargo editorial note specifying the embargo conditions. This form is used when the precise date and time at which the embargo expires is not known. Note that if the conditions are made of a date and time and additional conditions, all these conditions are expressed in the editorial note (i.e., the date and time aren't provided inside the embargoed element, but as part of the editorial note).
<embargoed/>
<edNote role="afpnoteRole:embargo">
    Embargoed until end of first auction day
</edNote>

See the NewsML-G2 specification for more information on the representation and processing model of embargo information.

For multimedia documents, the way embargo information is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its own embargo information, and a G2 item without an embargoed element is defined as not embargoed. In AFP's multimedia documents the only embargoed element to consider is those of the main item. The embargoed elements of non main items must be ignored. You must process multimedia documents in a way that applies embargo directives provided in the main news item to the entire content of the document (i.e., to all items in the document).

Event identifiers

Text, picture, still graphic, video and multimedia documents: multiples event identifiers may be provided by subject elements in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <subject qcode="QCode identifying an event" type="cpnat:event">
                    <name>
                        Auction for the Yves Saint Laurent and Pierre Bergé collection
                    </name>
                </subject>       
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: only one event identifier may be provided by a subject element in the content metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <subject qcode="QCode identifying an event" type="cpnat:event">
                    <name>
                        Auction for the Yves Saint Laurent and Pierre Bergé collection
                    </name>
                </subject>       
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

The news coverage of an event often spans multiple NewsML-G2 documents. For example the auction for the Yves Saint Laurent and Pierre Bergé collection may be covered by two news stories (one announcing the event and one reporting on the event later on), two interview transcripts (one with Pierre Bergé and one with a Christie's representative), a multimedia document, a video report and a number of pictures of the event. It might be interesting for you to know that all these documents are about the same event. For example, it might help your editorial team to access all the documents available about the event. Another example: if you operate a Web site publishing news you could use this knowledge to automatically provide links to related content.

To let you know that multiple NewsML-G2 documents relate to the same event, AFP creates unique event identifiers and insert them into documents. For example, an unique event identifier is assigned to the auction for the Yves Saint Laurent and Pierre Bergé collection, and each related document contains this identifier.


2016-03-30 14:36ZEvent identifierEvent identifierAn event happening in the real worldNewsML-G2 documentNewsML-G2 documentNewsML-G2 documentNewsML-G2 documentcoverscoverscoverscoversAFP assigns a unique identifier to the eventEach document contains the unique identifier of the event

Different NewsML-G2 documents covering the same event

An event identifier is the concept URI of a subject element whose type attribute, the QCode cpnat:event, resolves to http://cv.iptc.org/newscodes/cpnature/event. It is conveyed by the qcode attribute.

In addition to event identifiers we provide, whenever possible, the names of the events. An event name provides a short description of the event in natural-language. The name is provided by a name element inside the subject element.

See the section on subjects for more information about the subject element.

When a document covers multiple events it might contain multiple event identifiers, as shown in the example below:

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <subject qcode="QCode identifying an event" type="cpnat:event">
                    <name>Name of this event</name>
                </subject>       
                <subject qcode="QCode identifying another event" type="cpnat:event">
                    <name>Name of this other event</name>
                </subject>                
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>
Why are event identifiers provided using <subject> elements?

This is because events covered by a document are also subject matter of the document: things the document is about. Hence it is appropriate to convey their identifiers using the NewsML-G2 <subject> elements, along with other subjects of the documents. This allows them to be generically processed like any other subjects when that make sense, or to be processed specifically as event identifiers when needed, thanks to the type attribute which marks them as such.

General editorial note

Text, picture, still graphic, video and multimedia documents: a general editorial note may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <edNote role="afpnoteRole:client">
                    Original source is unknown and unverified. This photo was posted on twitter.
                    Following an official ban in San Theodoros on foreign media outlets covering
                    demonstrations, AFP is using pictures from other sources.
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The general editorial note provides some text in natural language addressed to the editorial people in your team receiving and processing the item. It can provide instructions or hints on how to handle the document, information about the nature of a correction (see example in the section on correction signal), excluded audience/usage, additional information about the content, etc. It is not intended for publication.

There is at most one general editorial note in a document. If present, it is provided by an edNote element whose role attribute, the QCode afpnoteRole:client, resolves to http://cv.afp.com/ednoteroles/client. Note that while NewsML-G2 allows for rich text by using some markup in the content of an editorial note, AFP's systems only output simple textual content not interspersed with markup.

The general editorial note is often used to express usage restrictions, as in the following example:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <edNote role="afpnoteRole:client">
                    EDITORIAL USE ONLY
                    NO MARKETING NO ADVERTISING CAMPAIGNS
                    NO ARCHIVE
                </edNote>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The following table provides examples of common usage restrictions you might find in pictures documents.

Examples of usage restrictions conveyed by the general editorial note
Phrase inside the general editorial note Comment
RESTRICTED TO EDITORIAL USE The picture can be used only by media outlets for news purposes (newspapers, magazines, radios, TVs, news websites and mobile news services...)
NO MARKETING NO ADVERTISING CAMPAIGNS The picture cannot be used for advertising or marketing.
NO INTERNET The picture cannot be published on Internet websites.
NO MOBILE The picture cannot be used by mobile services.
NO ARCHIVE The picture cannot be archived.
MANDATORY USE WITH AFP STORY The handout picture shall be published with the corresponding AFP story only (this mention is only available for handouts).
TO BE USED WITHIN XX DAYS FROM XX/XX/XXXX The picture cannot be used outside of the specified timeframe.
NO VIDEO EMULATION The picture cannot be used in a sequence of pictures to simulate a video.

Genres

Text, picture, still graphic and video documents: genres of the document may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A genre represented by a QCode and associated with a rank -->
                <genre rank="1" qcode="afpattribute:Interview"/>
                
                <!-- A genre represented by a QCode and a name and associated with a rank -->
                <genre rank="2" qcode="afpedtype:VideoWithTitling">
                    <name>Titling</name>
                </genre>  
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: genres of the document as a whole may be provided in the content metadata section of the main news item. Genres specific to a non-main item may be provided by the content metadata section of this item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- This genre is in the main news item:
                     it applies to the document as a whole -->
                <genre rank="1" qcode="afpattribute:Interview">
                    <name>Interview</name>
                </genre> 
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This genre only qualifies this item -->
                <genre rank="1" qcode="afpattribute:Profile">
                    <name>Profile</name>
                </genre> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Genres of a document, and of individual item in the case of multimedia documents, may be provided by genre elements. Each genre element describes a nature or a style of the content (e.g., an intellectual or journalistic form). There may be multiple genre elements per item, as a given item may be at the intersection of multiple genres.

In AFP documents, a genre is specified by a QCode, optionally completed by a natural language name.

Often used in AFP documents are genre defined in the schemes http://ref.afp.com/attributes/ (scheme alias afpattribute) and http://ref.afp.com/editorialtypes/ (scheme alias afpedtype).

The name child element, if present, provides a natural language name for the genre.

Identifier and version number

Text, picture, still graphic, video and multimedia documents: the document identifier is provided in the news item (for multimedia documents: in the main news item). A version number may be present too.

<newsMessage>
    <itemSet>
        <newsItem guid="http://d.afp.com/MM48X" version="5">
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the document identifier is provided in the package item. A version number may be present too.

<newsMessage>
    <itemSet>
        <packageItem guid="http://d.afp.com/MM48X" version="5">
        </packageItem>
    </itemSet>
</newsMessage>

A document is a set of information carrying some journalistic content and associated meta data. As news stories develop or corrections are made, new versions of the document are published.

Each NewsML-G2 document has a global unique identifier (guid), which is provided by the guid attribute of a newsItem or packageItem element. A guid is a character string. It is designed to be globally unique among all NewsML-G2 documents, past and future. This guid makes it possible to identify a document as it moves through the news workflow and is transferred/duplicated from place to place and from system to system. It is also used as a basis for an updating mechanism: an update is carried on by sending you a new version of a document identified by a given guid (i.e., the original and the new version share the same guid).

In AFP's NewsML-G2 documents, guids can take multiple forms. Examples include URIs in the http scheme, URNs in the namespace "newsml" [RFC3085bis] or AFP UNOs (a format more or less equivalent to IIM UNO).

Note: most AFP GUIDs look like plain URLs, for example: http://doc.afp.com/11N38S. However, they actually are non dereferencable URIs and their purpose is only to serve as identifiers.

From a technical point of view, given two representations of some journalistic content in NewsML-G2, the guid is what tells whether these two representations are those of the same document (possibly different versions of it): same guids means same document, different guids means different documents.

When integrating AFP's NewsML-G2 production into your information system you'll often need to compare guids. For example, when receiving a document from AFP you'll want to check if you already received some version of this document in the past, an action you'll perform by looking in your system for a document with the same guid.

A version number may be provided by a version attribute in the form of an XML Schema positive integer. It identifies the version of the document. The first time you receive a given document (i.e., a document identified by a given guid), this document isn't necessarily in its first version. That is, the version number of a document you receive for the first time may be greater than 1. The version number is incremented by 1 or more each time the document is updated. If no version attribute is present, you must assume that the document is in version 1 (i.e., first version).

How a new version of a document should be dealt with?

The answer is given by the NewsML-G2 documentation:
In the absence of any specific instructions from the provider, a "usable" item [cf. section on publishing status] should be regarded as replacing any previous version of the item with the same GUID. In practice, a provider is likely to provide some supplementary information in the form of a human-readable <edNote> [cf. section on general editorial note] which can be displayed to inform recipients of the reason for the update.
Often, new versions are issued to enrich previous ones with additional information, especially as stories develop in real time. Sometimes, however, a new version is meant to correct some error found in a previous version. In such case you may want to take some additional actions, as it might be the case that erroneous material has been published. Such correction-conveying versions are specifically tagged using a correction <signal>. For more information on this topic see the section on correction signal.

Information sources

Text, picture, still graphic and video documents: information sources may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>                
                <!-- An information source represented by a name and a role -->
                <infoSource role="isrol:origcont">
                    <name>AP</name>
                </infosource> 
                            
                <!-- An information source represented by a QCode, a name and a role -->
                <infoSource qcode="afpsource:2648" role="isrol:origcont">
                    <name>CHRISTIE'S</name>
                </infosource> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: information sources may be provided in the content metadata sections of the main news items. When an information source appears in a news item which is not the main one, it describes an information source for the content of this item. When an information source appears in the main news item, it should be considered as an information source of the "document", with no indication of the specific part of the content it is associated with (if any).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- This information source is in the main news item: it is an information source of the document -->
                <infoSource role="isrol:origcont">
                    <name>AP</name>
                </infosource> 
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This information source is specific to this item -->
                <infoSource qcode="afpsource:2648" role="isrol:origcont">
                    <name>Business Wire</name>
                </infoSource> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Information sources of a document, and of individual items in multimedia documents, may be provided by infoSource elements.

In AFP NewsML-G2 document, an information source is a party (person or organization) which originated, distributed, aggregated or supplied the content. For example, in a document created/published by AFP but reusing content provided by Business Wire, this source (i.e., Business Wire) will appear in an infoSource element.

In AFP documents, an information source is specified by either:

The URI space used to specify information source through QCodes is open and can evolve over time.

The name child element, if present, provides a natural language name for the information source.

The role attribute carries a QCode that specifies the role of the information source. AFP documents use the role "Content originator" whose Qcode is isrol:origcont and whose concept URI is http://cv.iptc.org/newscodes/infosourcerole/origcont.

Keywords

Text, picture, still graphic and video documents: keywords may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <keyword>culture</keyword>
                <keyword>arts</keyword>
                <keyword>fashion</keyword>
                <keyword>auction<keyword>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: keywords of the document as a whole may be provided in the content metadata section of the main news item. Keywords specific to an individual item may be provided by the content metadata section of that item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- These keywords are in the main news item: 
                     they are associated with the document as a whole -->
                <keyword>culture</keyword>
                <keyword>arts</keyword>
                <keyword>fashion</keyword>
                <keyword>auction<keyword>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- These keywords are specifically associated with this news item -->
                <keyword>people</keyword>
                <keyword>money</keyword>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: keywords may be provided in the content metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <keyword>culture</keyword>
                <keyword>arts</keyword>
                <keyword>fashion</keyword>
                <keyword>auction<keyword>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

Keywords are defined by NewsML-G2 as "free-text terms to be used for indexing or finding the content by text-based search engines".

If present, keywords are provided by keyword elements.

Some keyword may have a refined role, expressed by a role attribute. The value of this attribute is a QCode. Currently we may issue the QCode afpkrole:tagWeb, which resolves to http://cv.afp.com/keywordroles/tagWeb. For example:

<keyword role="afpkrole:tagWeb">culture</keyword>

Keywords with a http://cv.afp.com/keywordroles/tagWeb role are meant to be used to compute tag clouds [TagClouds].

Language of the content

Text, picture, still graphic and video documents: the language of the content may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <language tag="en"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: the language of the content may be provided in the content metadata section of each news item.

<newsMessage>
    <itemSet>
        <!-- An item whose content is in english -->
        <newsItem>
            <contentMeta>
                <language tag="en"/>
            </contentMeta>
        </newsItem>
        
        <!-- An item whose content is in french -->
        <newsItem>
            <contentMeta>
                <language tag="fr"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The tag attribute of the language element carries a BCP 47 language tag [RFC5646] that specifies the main language of the content. The content is what is provided inline or linked to by the content set (i.e., the contentSet element). For example, in text document this attribute specifies the main language the textual content is written in, and in a video document it typically specifies the main language used in the soundtrack.

The main languages used by AFP along their BCP 47 tags are shown in the table below.

Main languages in AFP production
Language BCP 47 tag
Arabic ar
English en
French fr
German de
Portuguese pt
Spanish es

Language of metadata

Text, picture, still graphic and video documents: the language of metadata is specified by the news item.

<newsMessage>
    <itemSet>
        <newsItem xml:lang="en">
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: the language of metadata is specified by each news item.

<newsMessage>
    <itemSet>
        <newsItem xml:lang="en">
        </newsItem>
        <newsItem xml:lang="en">
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the language of metadata is specified by the package item.

<newsMessage>
    <itemSet>
        <packageItem xml:lang="en">
        </packageItem>
    </itemSet>
</newsMessage>

The xml:lang attribute carries a BCP 47 language tag [RFC5646] that specifies the main language of the metadata (e.g., titles, subject's names, caption, etc.) provided by the item.

In a multimedia document, this attribute has the same value in every new items of the document (i.e., in a given document, all items make use of the same language for metadata).

Important design principle: In an AFP NewsML-G2 document, metadata is provided in a single language, with exceptions for a few elements. When some news content is of global interest we often provide metadata in multiple languages: in this case we do so by issuing multiple NewsML-G2 documents (e.g., one with metadata in french, another one with metadata in english, etc.). These are different documents: each one has its own GUID and lifecycle (see section on documents identifiers).

The main languages used by AFP along their BCP 47 tags are shown in the table below.

Main languages in AFP production
Language BCP 47 tag
Arabic ar
English en
French fr
German de
Portuguese pt
Spanish es

While most metadata in a NewsML-G2 document uses the language specified by the xml:lang attribute of the item element as shown in the examples above, there may be exceptions for a few elements. For example, in a video document the original transcription of some speech is typically provided in the original language that was actually used by the speaker(s), which may differ from the main language of metadata. Whenever possible, the language for such metadata is provided by an xml:lang attribute on the XML element conveying the metadata in question.

The example below shows a document whose main language of metadata is English but whose "transcription" metadata is in French.

<newsMessage>
    <itemSet>
        <newsItem xml:lang="en">
            <partMeta>
                <description role="afpdescRole:contentDescription">
                    Pierre Bergé speaks about the auction. 
                </description>
                <description xml:lang="fr" role="afpdescRole:transcription">
                    C’est le jour ou le dernier objet sera passé sous le marteau d'un commissaire priseur
                    que à mon sens – a mon sens - cette collection pourra écrire le mot fin.
                </description>
            </partMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Locations

AFP's NewsML-G2 documents can convey information about locations. We establish a distinction between locations from which the content originates (e.g., the place where a news story was written) and locations that are subject matter of the content. These two kind of locations are conveyed using different means, as described in the following sections.

Locations may be typed, using a type attribute. The following types are used in AFP documents:

Types of locations
Type Description QCode Concept URI
Geopolitical area In AFP documents, it is a generic type that may be used for any kind of location.
It merely informs that the associated element represents a location.
cpnat:geoArea http://cv.iptc.org/newscodes/cpnature/geoArea
Point of interest In AFP documents, this type is used for locations that cannot be classified as cities, country areas or countries. For instance the Eiffel Tower and the White House will be typed as points of interest, as well as the Sherwood forest or a random building. Note that this may diverges a bit from NewsML-G2 standard usage, where areas such as forests, ponds, hills, streets or random places are not usually classified as point of interest. cpnat:poi http://cv.iptc.org/newscodes/cpnature/poi
City Informs that the associated element represents a city. loctyp:City http://cv.iptc.org/newscodes/location/City
Country area In AFP documents it is typically used for areas such as provinces, states or other areas that may contain multiple cities but which pertain themselves to countries. loctyp:CountryArea http://cv.iptc.org/newscodes/location/CountryArea
Country Informs that the associated element represents a country. loctyp:Country http://cv.iptc.org/newscodes/location/Country

Locations from which the content originates (aka datelines)

Text, picture, still graphic and video documents: the locations from which the content originates are provided in the content metadata section of the news item (in the following example only one location is provided).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <located qcode="afplocation:281108" type="cpnat:poi">
                    <name>White House</name>
                    <related qcode="afplocation:6666" rel="skos:broader" type="loctyp:City">
                        <name>Washington</name>
                        <related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea"/>
                    </related>
                    <related qcode="afplocation:1149" type="loctyp:CountryArea">
                        <name>District of Columbia</name>
                        <related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:206" type="loctyp:Country">
                        <name>United States</name>
                        <related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
                    </related>
                    <POIDetails>
                        <position latitude="38.89761" longitude="-77.03637"/>
                    </POIDetails>
                </located>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: for news items in the document, the locations from which the content of the item originates may be provided in the content metadata section of the item (in the following example only one location per item is provided).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- Location from which the content of the main news item originates -->
                <located qcode="afplocation:2500" type="loctyp:City">
                    <name>Paris</name>
                    <related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
                        <name>France</name>
                        <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" /> 
                    </related>
                    <geoAreaDetails>
                        <position latitude="48.85341" longitude="2.34121" /> 
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- Location from which the content of this news item originates -->
                <located qcode="afplocation:2613" type="loctyp:City">
                    <name>Marseille</name>
                    <related qcode="afplocation:719" rel="skos:broader" type="loctyp:CountryArea">
                        <name>Bouches-du-Rhône</name>
                        <related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:67" type="loctyp:Country">
                        <name>France</name>
                        <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
                    </related>
                    <geoAreaDetails>
                        <position latitude="43.29695" longitude="5.38107"/>
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the location from which the content originates is provided in the content metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <located qcode="afplocation:6666" type="loctyp:City">
                    <name>Washington</name>
                    <related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
                        <name>District of Columbia</name>
                        <related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:206" type="loctyp:Country">
                        <name>United States</name>
                        <related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
                    </related>
                    <geoAreaDetails>
                        <position latitude="38.89511" longitude="-77.03637"/>
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

In AFP NewsML-G2 documents, located elements specify the geographical origin of the editorial content conveyed by the <contentSet> of a news item: the text of a news story, the jpeg renditions of a picture document, etc. For live reports, the located element specify the geographical origin of the live report. There is always at least one location provided per item.

Locations from which the content originates are not necessarily the locations the content is about. For example a news story about an event taking place in Paris may be written in London; in such case the city of London may be specified as the location from which the content originates. The locations the content is about are conveyed in another part of the document, as described in section "Locations that are subject matter of the document".

There are some subtleties about what "locations from which the content originates" means depending on the nature of the content; we discuss them in the table below. Note that the policy described here is specific to AFP. Other conventions might be in place at other news providers.

Policy used to specify the locations from which the content originates
Nature of content Policy
Text A location from which the content originates is usually a location (e.g., a city) where the text was written or from which it was dictated. Alternatively it might be the location of the event if an AFP reporter is present nearby. Multiple locations may be provided in the form of multiple located elements when the content originates (as defined here) from multiples locations; in this case the usual practice is to provide no more than two locations.
Picture The location from which the content originates is the location of the camera when the picture was shot. Therefore it may differ from the location of what is shown in the picture. Knowing the location of the camera is useful as it lets one know "how the subject of the picture looks like when viewed from that location". Only one location is provided.
Video The location from which the content originates is the location of the camera when the video was recorded. Therefore it may differ from the location of what is shown in the video. Knowing the location of the camera is useful as it lets one know "how the subject of the video looks like when viewed from that location". Only one location is provided. If the video is shot in different places, only one of these places is provided, usually the most significant.
Still or animated graphic When a graphic is produced, it is often accompanying or illustrating a separate production (typically of textual nature). In such case the location from which the content originates is the same as this production. Else, it is the location of the event the graphic is about.
Multimedia Each news item in a multimedia document specifies the location(s) from which the content originates. The exact meaning for each news item is determined by the nature of its content as described in this table.
Live report The location from which the content originates is the location of the event the live report is about. The value of this metadata can change as the live report develops. For example, the live report about the Bergé/Saint-Laurent auction may be tagged with the location where the auction takes place while we report on the auction, and later be tagged with the location where the Pierre Bergé press conference takes place while we report on this press conference.

The locations from which the content originates are provided by located elements in the content metadata section of news items. A given located element may convey several informations about a location:

In text documents or text components of multimedia documents we may provide multiple locations from which the content originates. In this case the current practice being to provide at most two. Below is an example:

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A location from wich the content originates -->
                <located qcode="afplocation:2500" type="loctyp:City">
                    <name>Paris</name>
                    <related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
                        <name>France</name>
                        <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" /> 
                    </related>
                    <geoAreaDetails>
                        <position latitude="48.85341" longitude="2.34121" /> 
                    </geoAreaDetails>
                </located>
                
                <!-- Another location from wich the content originates -->
                <located qcode="afplocation:6666" type="loctyp:City">
                    <name>Washington</name>
                    <related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
                        <name>District of Columbia</name>
                        <related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
                    </related>
                    <related qcode="afplocation:206" type="loctyp:Country">
                        <name>United States</name>
                        <related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
                    </related>
                    <geoAreaDetails>
                        <position latitude="38.89511" longitude="-77.03637"/>
                    </geoAreaDetails>
                </located>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>
When are multiple locations provided?

Multiple locations may be provided when the content originates from multiple locations. For example, suppose that we publish a story about the Bergé/Saint-Laurent auction. To write this story we might use informations provided by an AFP reporter present at the auction in Paris and by another AFP reporter present at a press conference given by Pierre Bergé at the same time in Washington. In this case we might provide Paris and Washington in located elements. Alternatively we might choose to provide the location where the story is actually written (say, e.g. London) instead of Paris and Washington.

Locations that are subject matter of the document

Text, picture, still graphic, video and multimedia documents: locations that are subject matter of the document may be provided in the news item (for multimedia documents: in the main news item) in the content metadata section. In text and multimedia documents only, additional information may be provided in assertions. Locations that are subject matter of the document are not provided in live report indexes.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                        
                <!-- The city of Beijing is a subject of the content -->
                <subject qcode="afplocation:2618" type="cpnat:geoArea">
                    <name>Beijing</name>
                </subject> 

                <!-- The city of Paris is a subject of the content and is a location of the event the content is about -->
                <subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
                    <name>Paris</name>
                </subject>
                
                <!-- Some locations are not identified by a qcode attribute but by an uri attribute (typically providing a geo URI [rfc5870])-->
                <subject uri="geo:43.82883,5.78688" type="cpnat:geoArea">
                    <name>Manosque</name>
                </subject>
            </contentMeta>
            
            <!-- This assertion provides additional information about Beijing  -->
            <assert qcode="afplocation:2618">
                <type qcode="loctyp:City"/>
                <geoAreaDetails>
                    <position latitude="39.9075" longitude="116.39723"/>
                </geoAreaDetails>
            </assert>
            
            <!-- This assertion provides additional information about Paris  -->
            <assert qcode="afplocation:2500">
                <type qcode="loctyp:City"/>
                <broader qcode="afplocation:67" type="loctyp:Country">
                    <name>France</name>
                    <related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
                </related>
                <geoAreaDetails>
                    <position latitude="48.85341" longitude="2.3488"/>
                </geoAreaDetails>
            </assert>
            
            <!-- This assertion provides additional information about Manosque  -->
            <assert uri="geo:43.82883,5.78688">
                <type qcode="loctyp:City"/>
                <geoAreaDetails>
                    <position latitude="43.82883" longitude="5.78688"/>
                </geoAreaDetails>
            </assert>
            
        </newsItem>
    </itemSet>
</newsMessage>

Locations that are subject matter of the document may be provided by subject elements. Note that other entities such as persons, media topics, organizations and so on may also be conveyed using subject elements. To differentiate them, a type attribute is used. Its value, a Qcode, is either cpnat:geoArea (resolving to http://cv.iptc.org/newscodes/cpnature/geoArea) or cpnat:poi (resolving to http://cv.iptc.org/newscodes/cpnature/poi). All these subjects share some common properties, such as optional type and afp:role attributes that are described in the section on subjects.

Additional information about these locations may be provided by assertions; an assertion is represented by an assert element. You can correlate assertions with specific locations using their concept URIs: the information provided by an assertion applies to the location whose concept URI is conveyed by the qcode or the uri attribute of the assertion. In the example above, a subject element whose qcode resolves to http://ref.afp.com/locations/2618 (in AFP documents, afplocation is a scheme alias for http://ref.afp.com/locations/). We also have an assert element whose qcode resolves to http://ref.afp.com/locations/2618. It means that both this subject and this assertion convey information about the same location.

If your don't perform QCode resolution (cf. section on controlled vocabularies and qualified codes) then you can correlate QCode-based assertions with specific locations using their QCodes directly.

A given assertion may convey several informations about a location:

Locations of the event(s)

Some locations that are subject matter of the document also happen to be locations of event(s). A location of event is a place where an event the document is about happens or is foreseen to happen. Locations of event(s) are provided by subject elements with an attribute role in namespace http://www.afp.com/format/internal/ equal to http://cv.afp.com/subjectroles/locationOfEvent.

For example, in our document about the auction of the Pierre Bergé and Yves Saint-Laurent collection, we could have the city of Paris as a subject because the news story mentions that the auction takes place in Paris. We could also have the city of Beijing as a subject because the news story mentions China's claims that some objects in the auction were stolen in Beijing during the opium wars and therefore should be returned. In this case, both cities would appear in dedicated subject elements. The city of Paris could be tagged as being a location of event using the role attribute because the auction happens in Paris and in our example the auction is the event the story is about. Beijing would not be tagged as being a location of event because while it is a subject of the story it is not a location of the event the story is about.

There is no default value for the role attribute: if a subject element conveying a location does not have a role attribute with a value of http://cv.afp.com/subjectroles/locationOfEvent, it doesn't mean that it isn't a location of the event, but merely that the information regarding this matter isn't provided by the element.

Products the document belongs to

All documents: products the document belongs to may be provided in the header of the news message.

<newsMessage>
    <header>
        <afp:headerExtension xmlns:afp="http://www.afp.com/format/internal/">
            <!-- The document belongs to this product -->
            <afp:product name="EAA" uri="http://products.afp.com/wires/EAA"></afp:product>
            
            <!-- The document also belongs to this other product -->
            <afp:product name="MAX" uri="http://products.afp.com/wires/MAX"></afp:product>
        </afp:headerExtension>
    </header>
</newsMessage>

The commercial relationship between AFP and its clients is often structured around the notion of product. A product is a subset of AFP's production a client can subscribe to. Each product is defined by several characteristics such as subject matters, media types, languages, etc.

The product elements, if present, are provided in the headerExtension inside the header of the newsMessage. The headerExtension element is an AFP specific extension and is defined in namespace http://www.afp.com/format/internal/.

Each product element identifies a product the document belongs to. It can be a product you have subscribed to but it is not necessarily the case: typically, all products the document belongs to are listed regardless of your specific subscriptions.

In your information system, a possible usage of the product elements is to automatically route documents to specific teams or workflows. For example you might want to automatically route documents of the "Economic & Business News" product to your economics specialists.

Each product is uniquely identified by an URI, provided by the uri attribute. You can ask your AFP representative for the URIs of the products you have subscribed to.

The name attribute provides the name of the product, meant to be used for display purpose.

The following table provides examples of products.

Examples of products
Name Unique identifier Description
EAA http://products.afp.com/wires/EAA The World News (EAA) wire offers up-to-the-minute, complete English-language global news, sports and business coverage delivered specifically to suit the needs of clients in Europe, Africa and the Middle East. EAA also provides in-depth coverage of Europe for Europe.
MAX http://products.afp.com/wires/MAX The world news wire, MAX, carries AFP's entire English-language news production and is designed specifically for clients who demand comprehensive global coverage.
FRS http://public.products.afp.com/wires/FRS The FRS wire is the AFP news feed mainly for French customers. This feed in French-language offers French and foreign sources of information on varied topics (general news, politics , economy, culture , social, sport and equestrian ), with emphasis on in-depth coverage of France.
DAB http://public.products.afp.com/wires/DAB The DAB wire in French language is designed primarily for African customers. Produced in Paris by a specialized desk, which processes and translates the information gathered by the largest networks of all international agencies active in Africa, it is also powered by the four other regional centers of AFP (Hong Kong, Nicosia, Washington and Montevideo) to provide comprehensive coverage of world news round the clock and seven days a week.

Provider

Text, picture, still graphic, video and multimedia documents: the provider of the document is given in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <provider qcode="afpprovider:AFP-TV">
                    <name>AFP-TV</name>
                    <broader qcode="nprov:AFP"/>
                        <name>AFP</name>
                    </broader>    
                </provider>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the provider of the document is given in the item metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <provider qcode="afpprovider:AFP-TV">
                    <name>AFP-TV</name>
                    <broader qcode="nprov:AFP"/>
                        <name>AFP</name>
                    </broader>    
                </provider>
            </itemMeta>
        </packageItem>
    </itemSet>
</newsMessage>

The provider of a document is the party responsible for the management and the release of the document (i.e., the publisher of the document). It is given by the qcode attribute of the provider element. This element is always present. The QCode is part of one of the following schemes:

The name child element, if present, provides a natural language name for the provider.

The broader child element, if present, specifies a larger entity the provider is part of. This entity is identified by a qcode attribute, optionally completed by a natural language name in a name element.

In the example above, the document is provided by AFP-TV, a service inside AFP. The fact that this provider is part of AFP is expressed using the broader element.

Publishing Status

Text, picture, still graphic, video and multimedia documents: the publishing status is provided by the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ specifying the publishing status"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the publishing status is provided by the item metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ specifying the publishing status"/>
            </itemMeta>
        </packageItem>
    </itemSet>
</newsMessage>

A document can be usable, withheld or canceled. The table below describes how this is specified in documents and what it means.

Publishing statuses
Status Representation Meaning
Usable No pubStatus element or a pubStatus element with a qcode attribute stat:usable resolving to http://cv.iptc.org/newscodes/pubstatusg2/usable The document is usable. Note that "usable" does not necessarily means "publishable"; for example an embargo may prevent publication of an otherwise usable document.
Withheld A pubStatus element with a qcode attribute stat:withheld resolving to http://cv.iptc.org/newscodes/pubstatusg2/withheld The document and all its previous versions must not be used until further notice (except for a few metadata, as described bellow). This status is typically used when a serious problem with a document is suspected and is under investigation (e.g., important information in the document is suspected to be false).
In the meantime, any usage of the document must be prohibited, if needed by the way of alerts. If the document has been published it must be rendered inaccessible until further notice. You must immediately remove it from all your online services and stop using it in any other fashion. People that may have viewed previous versions should be notified, whenever possible, that it is being retracted until further notice. If you have been authorized by AFP to distribute it to third parties, you must ensure that the same actions are carried out by them.

In a withheld document, only the following metadata can be considered reliable/useable: GUID, version number, publication status, general editorial note (in this version of the document only).
Canceled A pubStatus element with a qcode attribute stat:canceled resolving to http://cv.iptc.org/newscodes/pubstatusg2/canceled The document and all its previous versions must not be used, ever (except for a few metadata, as described bellow). This status is typically used when a serious problem with a document is detected (e.g., important information in the document has been found to be false) and the scope of the problem is wide enough to warrant a complete kill of the document instead of issuing a correction.
Any usage of the document must be prohibited, if needed by the way of alerts. If the document has been published it must be rendered inaccessible. You must immediately remove it from all your online services, stop using it in any other fashion and delete it from your servers. People that may have viewed previous versions should be notified, whenever possible, that it is being retracted. If you have been authorized by AFP to distribute it to third parties, you must ensure that the same actions are carried out by them.

In a cancelled document, only the following metadata can be considered reliable/useable: GUID, version number, publication status, general editorial note (in this version of the document only) and cancel-dedicated rendition(s). A cancel-dedicated rendition is specifically designed to be used canceled documents, allowing to publish something (e.g., a note about the cancellation) replacing the canceled content . It is conveyed by an inlineXML or remoteContent element and denoted through the rendition attribute by the QCode afprnd:cancel, resolving to http://cv.afp.com/renditions/cancel.

When a document is withheld or canceled, a general editorial note is often provided to provide additional information and/or instructions.

The NewsML-G2 specification provides detailed information on how you must make use of this publishing status when processing documents.

For multimedia documents, the way publishing status is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its own publishing status, and a G2 item without a pubStatus element is defined as usable. In AFP's multimedia documents the only pubStatus element to consider is those of the main item. The pubStatus elements of non main items must be ignored. You must process multimedia documents in a way that applies the publishing status provided in the main news item to the entire content of the document (i.e., to all items in the document).

Subjects

Text, picture, still graphic and video documents: subjects of the document may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A subject represented by a natural language name  -->
                <subject>
                    <name>auction</name>
                </subject> 

                <!-- A subject represented by a QCode  -->
                <subject qcode="medtop:20000273"/>
                                
                <!-- A subject represented by a QCode and a natural language name  -->
                <subject qcode="medtop:01000000">
                    <name>arts, culture and entertainment</name>
                </subject> 

                <!-- A subject represented by a QCode, a natural language name, a type and a role -->
                <subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
                    <name>Paris</name>
                </subject> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: subjects of the document as a whole may be provided in the content metadata section of the main news item. Subjects specific to an item may be provided in the content metadata section of this item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <!-- This subject is in the main news item: 
                     it applies to the document as a whole -->
                <subject qcode="medtop:20000031" type="cpnat:abstract">
                    <name>visual art</name>
                </subject> 
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- This subject only applies to this news item -->
                <subject qcode="medtop:20000011" type="cpnat:abstract">
                    <name>fashion</name>
                </subject> 
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: subjects of the document may be provided in the content metadata section of the package item. In live reports document, subjects expressed using a controlled vocabulary are only media topics and event identifiers.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <!-- A subject represented by a natural language name  -->
                <subject>
                    <name>auction</name>
                </subject>

                <!-- A subject represented by a QCode  -->
                <subject qcode="medtop:20000273"/>
                                
                <!-- A subject represented by a QCode and a natural language name  -->
                <subject qcode="medtop:01000000">
                    <name>arts, culture and entertainment</name>
                </subject> 
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

Subjects are important topics of the content; what the content is about. Some subjects of a document (and of individual items in the case of multimedia documents) may be provided by subject elements. Each subject element contains an indication on what the document's content (or item's content) is about.

Some subjects of the document may be described by keyword elements instead of subject elements. However, keywords may also be used for other purposes: while a keyword may describe a subject of the document, not all keywords do. See the Keywords section.

In AFP documents, a subject represented by a subject element is specified by either:

The URI space used to specify subjects through qcode and uri attributes is open and can evolve over time. Often used in AFP documents are QCodes identifying IPTC media topics [IPTCMediaTopics], a standard taxonomy for categorizing news content. Also often used are QCodes identifying events, in order to associate a document with the events it covers. The table below presents common schemes used in AFP documents to identify subjects. Note that this list is not exhaustive.

Common types of subjects used in AFP documents
Type Scheme URI Scheme alias Comment
Media topics http://cv.iptc.org/newscodes/mediatopic/ medtop Media topics is a standard IPTC taxonomy for categorizing news content. For example the concept URI http://cv.iptc.org/newscodes/mediatopic/01000000 identifies the category "arts, culture and entertainment", which is defined as "Matters pertaining to the advancement and refinement of the human mind, of interests, skills, tastes and emotions".
Events http://eventmanager.afp.com/events/ afpevent An AFP specific scheme for identifying events. It is used to associate a document with the event it covers. For more on this topic see the section on event identifiers.
Persons http://ref.afp.com/persons/ afpperson AFP specific scheme for identifying persons. For example the concept URI http://ref.afp.com/persons/193573 identifies Pierre Bergé.
Organizations http://ref.afp.com/organizations/ afporganization AFP specific scheme for identifying organizations. For example the concept URI http://ref.afp.com/organizations/5308 identifies Christie's, the auction company.
Locations http://ref.afp.com/locations/ afplocation AFP specific scheme for identifying locations. For example the concept URI http://ref.afp.com/locations/2500 identifies the city of Paris.

A subject element can have a name child element. If present it provides a natural language name for the subject.

In a given item, the order of appearance of subject elements provides a hint about their relative importance (i.e., editorial significance) in the context of this item: a subject should be considered as having either the same or a lesser importance than subjects appearing before in the item. Note that while AFP's documents currently don't rank subjects with rank attributes, that may change in the future. In order to be forward compatible, if your NewsML-G2 processor interprets such ranks, the relative importance they convey should take precedence over the relative importance conveyed by the order of appearance of subjects elements in the item. The rank attribute is described in the NewsML-G2 specification.

Optional attributes (these attributes may or may not be present in a given subject element):

type: this attribute carries a QCode that specifies the type of the subject (i.e., person, organization, event, abstract concept, etc.). The value space for this attribute is open, but in AFP documents you'll typically find types defined in the standard IPTC "Nature of a concept" controlled vocabulary [IPTCCPNatures].

role (in namespace http://www.afp.com/format/internal/): some subjects have a specific role, which is conveyed by this attribute in the form of an URI. This attribute is not defined by the NewsML-G2 standard: it is an AFP specific extension and is therefore defined in a specific namespace.

Currently the only possible value for this attribute when it is present is http://cv.afp.com/subjectroles/locationOfEvent. If a subject is tagged with this role then this subject is a location of the event(s) the editorial content is about. This usage is described in detail in the section "Locations that are subject matter of the document".

Titles & subtitles

Documents may contain various types of titles and multiple levels of subtitles.

Note that while NewsML-G2 allows for rich text by using some markup in the content of titles and subtitles, AFP's systems only output simple textual content not interspersed with markup.

Titles

Text, picture, still graphic, video and multimedia documents: titles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- The main title of the document -->
                <headline>
                    YSL-Bergé collection sets new world record at auction 
                    for a private collection
                </headline>
                
                <!-- The short title of the document -->
                <headline role="afpheadlinerole:shorttitle">
                    YSL-Bergé collection: a new record at auction
                </headline>
                
                <!-- The long title of the document -->
                <headline role="afpheadlinerole::longtitle">
                    Yves Saint Laurent/Pierre Bergé collection sets new world record at 
                    auction for a private collection with more than 206 million euros
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: A title may be provided in the content metadata section of the package item.

<newsMessage>
    <itemSet>
        <nackageItem>
            <contentMeta>
                <!-- The title of the live report -->
                <headline>
                    YSL-Bergé auction live report
                </headline>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

All documents may contain a title. In addition, text, picture, still graphic, animated graphics, video and multimedia documents may include a short title and/or a long title. These titles, if present, are provided by headline elements located in the content metadata section of the first item. There is at most one title, one short title and one long title.

You can determine the type of a given title by looking for the presence and value of a role attribute, as described in the following table.

Title types
Type Function Identification
Title The main title of the document: a short summary of the journalistic content. No role attribute.
Short title A shorter version of the title, suitable for displaying on space constrained surfaces (e.g., mobile handsets). A role attribute whose value, the QCode afpheadlinerole:shorttitle, resolves to http://cv.afp.com/headlineroles/shorttitle
Long title A longer version of the title. This is a short catch line, useful, for example, to display on a banner. A role attribute whose value, the QCode afpheadlinerole:longtitle, resolves to http://cv.afp.com/headlineroles/longtitle

Subtitles

Text and multimedia documents: subtitles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item). Subtitles are only provided for text and multimedia documents.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <headline role="afpheadlinerole:subtitle" rank="0">
                    Auction to continue tuesday and wednesday  
                </headline>
                <headline role="afpheadlinerole:subtitle" rank="1">
                    Prestigious attendance noted on first day
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In addition to titles, text and multimedia documents may contain subtitles. Subtitles complement tittles with additional information about the news content of the document. In current production there is at most two subtitles. Like titles, they are provided by headline elements in the content metadata section of the main news item. Their subtitle nature is denoted by a role attribute whose value, the QCode afpheadlinerole:subtitle, resolves to http://cv.afp.com/headlineroles/subtitle. A rank attribute may be present to specify the relative importance of subtitles. Ranks are nonnegative integers. Subtitles with a lower value for this attribute have a higher importance than subtitles with a higher value of this attribute, and subtitles without a rank attribute have a lower importance than subtitles with a rank attribute. See the NewsML-G2 specification for additional information on ranks and their processing model.

Type of document

An AFP NewsML-G2 document can be of one of the following types:

The type of a NewsML-G2 document defines important characteristics of the document such as the nature of its content, its XML structure, the metadata it provides as well as some elements of its processing model.

The overview section provides a description of these types.

To determine the type of a document, you first need to determine if it is a multimedia or non-multimedia document. A document is multimedia if the item set of the news message contains a news item whose item metadata section contains a link element with both:

That is, a multimedia document contains the following:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In a non-multimedia document, the type is the item class of the item present in the item set of the news message.

For Text, picture, still graphic, video and multimedia documents the item class is given by the qcode attribute of the itemClass element in the item metadata section of a news item, as shown here:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <itemClass qcode="QCode specifying the type"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

For live reports the item class is given by the qcode attribute of the itemClass element in the item metadata section of a package item, as shown here:

<newsMessage>
    <itemSet>
        <packageItem>
            <itemMeta>
                <itemClass qcode="QCode specifying the type"/>
            </itemMeta>
        </packageItem>
    </itemSet>
</newsMessage>

The itemClass element is always present. For non multimedia documents, it's qcode attributes resolves to a concept URI that specifies the type of the document, as shown in the table below.

Item classes used in AFP document
Type QCode Concept URI
Text ninat:text http://cv.iptc.org/newscodes/ninature/text
Picture ninat:picture http://cv.iptc.org/newscodes/ninature/picture
Video ninat:video http://cv.iptc.org/newscodes/ninature/video
Still graphic ninat:graphic http://cv.iptc.org/newscodes/ninature/graphic
Animated graphic ninat:animated http://cv.iptc.org/newscodes/ninature/animated
Interactive graphic afpinat:interactive http://cv.afp.com/itemnatures/interactive
Live report index afpinat:liveReport http://cv.afp.com/itemnatures/liveReport

The NewsML-G2 standard states that it is mandatory to use one of the IPTC News Item Nature NewsCodes schemes for item classes. AFP NewsML-G2 deviates from this rule by using an AFP specific scheme (whose URI is http://cv.afp.com/itemnatures/) in addition to the mandatory IPTC schemes.

Urgency

Text, picture, still graphic, animated graphic, video and multimedia documents: the urgency of the document may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <urgency>1</urgency>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Live report indexes: the urgency of the document may be provided in the content metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <urgency>1</urgency>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

A document may include an indication of the editorial urgency of its content in an urgency element. The content of this element is an integer from 1 (highest urgency) to 9 (lowest urgency). Usually, AFP documents are tagged with urgencies from 1 to 4.

There is often a correlation between this property and the role in workflow of the document. In our documents, flashes are typically issued with the highest urgency (i.e., a value of 1) alerts with an urgency of 2 and urgents with an urgency of 3.

Data specific to text and multimedia documents

Some data appear only in text and multimedia documents. This section details these data elements.

Catchline

Text documents: a catchline may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <headline role="afpheadlinerole:introduction">
                    The Yves Saint Laurent and Pierre Bergé collection sets new world record at  
                    auction for a private collection on monday, the first day of a three action  
                    days, with more than 206 million euros. Participants describe first day
                    as "surprising, moving, electric!".
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a catchline may be provided in the content metadata section of the main news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentMeta>
                <headline role="afpheadlinerole:catchline">
                    The Yves Saint Laurent and Pierre Bergé collection sets new world record at  
                    auction for a private collection on monday, the first day of a three action  
                    days, with more than 206 million euros. Participants describe first day
                    as "surprising, moving, electric!".
                </headline>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A catch line, if present, provides a clear and concise summary of the story that tells the reader what has happened in simple language. It is designed to arouse or call viewer's attention. It gives an overview of all the main elements of the news. A catchline may be found at most once per document.

In text documents the catchline is provided by a headline element whose role attribute, the QCode afpheadlinerole:introduction, resolves to http://cv.afp.com/headlineroles/introduction. At the time of this writing a catchline may be provided only for text documents produced by SID (Sport-Informations-Dienst), an AFP subsidiary. To determine if the kind of text documents you are interested in might contain a catchline you are advised to discuss the matter with your AFP representative.

In multimedia documents the catchline is provided by a headline element whose role attribute is either afpheadlinerole:catchline (resolving to http://cv.afp.com/headlineroles/catchline) or afpheadlinerole:introduction (resolving to http://cv.afp.com/headlineroles/introduction).

While NewsML-G2 allows for rich text by using some markup in the content of a catch line, AFP's systems only output simple textual content not interspersed with markup.

In some documents you might observe that the content of the catchline is the same as the first paragraph of the main textual content of the document. Note however that this is not always the case and that sometimes an original catchline is provided.

Text documents and multimedia documents: the number of hypertext links to external resources present in textual or multimedia content may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
    <itemSet>
        <newsItem>
            <itemMeta>
                <afp:extension>
                    <afp:stats>
                        <afp:totalLinks>
                            3
                        </afp:totalLinks>
                    </afp:stats>
                </afp:extension>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The HTML (in XML syntax) rendition of the textual or multimedia content can contain hypertext links to external resources, typically conveyed by <a> elements. External resources are resources that are not intrinsically part of the document; for example, in a multimedia document a link to one of the item of the document isn't a link to an external resource whereas a link to a Wikipedia page is.

As shown in the example above this number may be provided as an integer by a totalLinks element inside a stats element inside an extension element in the item metadata section of the (main) news item.

Note that the totalLinks, stats and extension elements are not standard NewsML-G2 vocabulary but part of an AFP's specific extension. They are defined in an XML namespace whose name is http://www.afp.com/format/internal/.

Related production

Text and multimedia documents: mentions of the existence of related production may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <!-- The following signals that AFP is publishing/will publish related photo and video production -->
                <signal qcode="afpmedtype:Photo"/>
                <signal qcode="afpmedtype:Video"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Text and multimedia documents may contain mentions of the existence of related production, i.e., additional production covering the event(s) the document is about. For example, if AFP has released or plan to release photo(s) and video(s) of the Yves Saint Laurent auction then it may be mentioned in the metadata of a text or multimedia news story covering this auction, as shown in the example above. To that end, we use signal elements specifying which type of related production exists or is planned, using a controlled vocabulary defined by the scheme http://ref.afp.com/mediatypes/ (scheme alias: afpmedtype).

We provide only one signal by type of related production. For example, if there are several related photos, there may be only one <signal qcode="afpmedtype:Photo"/> element.

Note that signal elements are also used for other purposes (e.g., correction signal). Only signal elements in the scheme http://ref.afp.com/mediatypes/ are mentions of related production.

The table below provides the QCodes/concepts URIs that are used in these signal elements. See the overview section for a descriptions of the various types of news content this table refers to.

Types of related production
Concept URI QCode Description
http://ref.afp.com/mediatypes/Photo afpmedtype:Photo Related picture(s). For example, a picture of the Yves Saint Laurent auction.
http://ref.afp.com/mediatypes/PHOTOARCH afpmedtype:PHOTOARCH Related picture(s) from archive material. It is typically an archive picture of someone or something that plays an important role in the event(s). For example an archive picture of Yves Saint Laurent, or an archive picture of Christie's salerooms. When this mention is used, the related archive pictures are republished by AFP.
http://ref.afp.com/mediatypes/Video afpmedtype:Video Related video(s). For example, a video report about the Yves Saint Laurent auction.
http://ref.afp.com/mediatypes/LIVEVIDEO afpmedtype:LIVEVIDEO Related video(s) providing live coverage. For example a video of the Yves Saint Laurent auction broadcasted live.
http://ref.afp.com/mediatypes/VIDEOARCH afpmedtype:VIDEOARCH Related video(s) from archive material. It is typically an archive video of someone or something that plays an important role in the event(s). For example an archive video of Yves Saint Laurent, or an archive video of Christie's salerooms. When this mention is used, the related archive videos are republished by AFP.
http://ref.afp.com/mediatypes/Sketch afpmedtype:Sketch Related courtroom sketch(s). A courtroom sketch is an artistic depiction of the proceedings in a court of law. In many jurisdictions, cameras are not allowed in courtrooms in order to prevent distractions and preserve privacy. Consequently we rely on sketch artists for illustrations of the proceedings.
http://ref.afp.com/mediatypes/Graphic afpmedtype:Graphic Related still graphic(s).
http://ref.afp.com/mediatypes/ANIGRAPHIC afpmedtype:ANIGRAPHIC Related interactive graphic(s).
http://ref.afp.com/mediatypes/VIDEOGRAPHIC afpmedtype:VIDEOGRAPHIC Related videographic(s).
http://ref.afp.com/mediatypes/Multimedia afpmedtype:Multimedia Related multimedia document(s).
http://ref.afp.com/mediatypes/LIVEREPORT afpmedtype:LIVEREPORT Related live report(s).
http://ref.afp.com/mediatypes/INTERACTIVEGRAPHIC afpmedtype:INTERACTIVEGRAPHIC Related interactive graphic(s).

The mechanism described in this section is not the only one to deal with related production. As described in the section on event identifiers, we also provide you with correlation keys allowing you to identify documents covering the same events.

Role in workflow

Text and multimedia documents: a role in workflow may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <role qcode="QCode specifying the role in workflow"/>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Some text and multimedia documents carry an indication of their role in workflow (aka editorial role). This allows you to handle them in specific ways. This role, if present, is specified by the qcode attribute of the role element. The possible values for the role are taken from a controlled vocabulary provided by the IPTC (we do not use its whole value space, though). They are described in the table below, where the Concept URI column gives the URI the QCode resolves to.

Roles in workflow
Role Description QCode Concept URI
Flash A very short text – typically four or five words – on an event of exceptional importance.
Flashes are rare. For example, only four events were reported by AFP by a flash in 2008 : Kosovo’s declaration of independence; the opening of the Beijing Games; Russia’s recognition of South Ossetia and Abkhazia as independent states; and Barack Obama’s victory in the US presidential elections. A flash is usually followed within five minutes by an urgent providing more information
erol:flash http://cv.iptc.org/newscodes/edrole/flash
Alert A very short text with high priority. An alert is usually followed within five minutes by an urgent providing more information. Fits in a single line. erol:alert http://cv.iptc.org/newscodes/edrole/alert
Urgent A short text on a major development of a top story. An urgent is typically two paragraph long, or longer when it provides a follow-up to multiple alerts. On a freshly breaking story, an urgent is typically followed within 10 minutes by a 200-250 word lead. erol:urgent http://cv.iptc.org/newscodes/edrole/urgent
Lead A sum-up or a complete version of a developing story. erol:lead http://cv.iptc.org/newscodes/edrole/lead

When a document is updated, its role in workflow may be updated too. For example it is typical for a breaking news that deserves immediate diffusion to starts its life as an alert, then becomes an urgent, then a lead, as it gets refreshed/enriched with more content. Each version of the document share the same guid (see the section on identifiers).

Multimédia 3 Calque 1 Only for events of exceptional importance Flash Alert Urgent lead Second lead Thirdlead Ninthlead

Evolution over time of a developing story

Once a document is a lead, subsequent versions may be qualified as "second lead", "third lead" and so on up to a "ninth lead". However, this qualification is not done through the role in workflow property: this property use the same concept URI of http://cv.iptc.org/newscodes/edrole/lead (QCode erol:lead) from the first lead through the ninth one. To convey what kind of lead the document is, we use a <genre> element (see the section on genres). For example, we typically convey that a document is a first lead by specifying a role in workflow with the concept URI http://cv.iptc.org/newscodes/edrole/lead and a genre with the concept URI http://ref.afp.com/editorialtypes/Lead (QCode afpedtype:Lead), as in the following example:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <role qcode="erol:lead"/>
            </itemMeta>
            <contentMeta>
                <genre qcode="afpedtype:Lead" />
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

For a second lead, the role in workflow is still http://cv.iptc.org/newscodes/edrole/lead and a genre with a concept URI of http://ref.afp.com/editorialtypes/2ndlead (QCode afpedtype:2ndlead) is provided:

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <role qcode="erol:lead"/>
            </itemMeta>
            <contentMeta>
                <genre qcode="afpedtype:2ndlead" />
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A document with a role in workflow of "lead" can also be qualified by the genre "general lead", whose meaning is described at the end of the table below. Typically a general lead has a different guid than the various documents it consolidates. A document cannot be both a general lead and "first lead" or "second lead" etc.

The following table describes the various genres used to qualify a lead.

Genres used to qualify a lead
        Genre         Description QCode Concept URI
Lead (typically used to mean "first lead") A sum-up or a complete version of a developing story afpedtype:Lead http://ref.afp.com/editorialtypes/Lead
Second lead A sum-up or a complete version of a developing story. For a given story, common usage is that a second lead is published only if a lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:2ndlead http://ref.afp.com/editorialtypes/2ndlead
Third lead A sum-up or a complete version of a developing story. For a given story, common usage is that a third lead is published only if a second lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:3rdlead http://ref.afp.com/editorialtypes/3rdlead
Fourth lead A sum-up or a complete version of a story. For a given story, common usage is that a fourth lead is published only if a third lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:4thlead http://ref.afp.com/editorialtypes/4thlead
Fifth lead A sum-up or a complete version of a story. For a given story, common usage is that a fifth lead is published only if a fourth lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:5thlead http://ref.afp.com/editorialtypes/5thlead
Sixth lead A sum-up or a complete version of a developing story. For a given story, common usage is that a sixth lead is published only if a fifth lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:6thlead http://ref.afp.com/editorialtypes/6thlead
Seventh lead A sum-up or a complete version of a developing story. For a given story, common usage is that a seventh lead is published only if a sixth lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:7thlead http://ref.afp.com/editorialtypes/7thlead
Eighth lead A sum-up or a complete version of a developing story. For a given story, common usage is that a eighth lead is published only if a seventh lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:8thlead http://ref.afp.com/editorialtypes/8thlead
Ninth lead A sum-up or a complete version of a developing story. For a given story, common usage is that a ninth lead is published only if a eighth lead is already out. It provides a refreshed and/or enriched version of that story. afpedtype:9thlead http://ref.afp.com/editorialtypes/9thlead
General lead A large sum-up or a complete version of a story. A general lead regroups, hierarchizes and develops all available elements of a developing story, including elements that were previously published under a number of different documents, each one focusing on specific facets of the more general story. afpedtype:LeadGeneral http://ref.afp.com/editorialtypes/LeadGeneral

Word count

Text and multimedia documents: the word count is provided in the inline XML rendition of the content of the news item (for multimedia documents: in the main news item).

<newsMessage>
    <itemSet>
        <newsItem>
            <contentSet>
                <inlineXML wordcount="450">
                </inlineXML>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

The word count gives an approximation of the size of the textual content of the document (not including textual content provided in metadata). That size is provided as an approximative count of words: when it is computed, each individual word might not count for one as short words count for less than one and long words count for more than one.

The word count is provided by the wordcount attribute of the inlineXML element of the news item. It is a non-negative integer. It is present in all text and multimedia documents.

Data specific to text documents

Some data is specific to text documents. This section details these data elements.

Textual content

Text documents: the textual content is provided in the content set of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentSet>
                <inlineXML contenttype="application/xhtml+xml">
                    <html xmlns="http://www.w3.org/1999/xhtml">
                        <head>
                            <title>
                                YSL-Bergé collection sets new world record at auction 
                                for a private collection
                            </title>
                        </head>
                        <body>
                            <p>The Yves Saint Laurent and Pierre Bergé collection sets 
                            new world record at auction for a private collection. 
                            Hundreds of art treasures amassed by late fashion designer
                            Yves Saint Laurent and his companion Pierre Berge over half
                            a century are being auctioned.</p>
                            <p>Bids hit 206 million euros (261 million dollars) on February
                            23, 2009 making it the biggest private collection ever 
                            auctioned with two days of sales still left to run.</p>
                            ...
                            ...
                            <!-- An hypertext link -->
                            The <a class="ignorableTextFalse" href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
                            wikipedia page about Yves Saint-Laurent</a> claims that ...
                            ...
                        </body>
                    </html>
                </inlineXML>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

The textual content of the document is the main journalistic text of the document. It is provided by an inlineXML element. It is expressed using the XML syntax of HTML. This is explicitly denoted by a contentType attribute with a value of application/xhtml+xml.

The textual content can also contain links to entities that aren't logically part of the document, such as other NewsML-G2 documents, Web pages (as shown in the example above), etc. The sections below describe how these link are represented.

Note that text items of multimedia documents can also contain similar data, but with additional information such as links to visual content. This is described in section "Data specific to multimedia documents".

Hypertext links to other resources

The HTML can contain hypertext links to other resources such as Web pages. They may be provided by a elements. For example here is a link to a wikipedia page:

<a class="ignorableTextFalse"
   href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)" >wikipedia page about Yves Saint-Laurent</a>

The class attribute, if present, may be used to specify either the class name "ignorableTextFalse" or "ignorableTextTrue". These class names are meant to assist you if you need to remove hypertext links from the HTML content (this is a common need for some of our clients).

ignorableTextFalse

ignorableTextFalse means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the hypertext links:

Pierre Bergé quoted the 
<a class="ignorableTextFalse" 
   href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>
to illustrate...

After removing hypertext links the fragment should be:

Pierre Bergé quoted the wikipedia page about Yves Saint-Laurent to illustrate...
ignorableTextTrue

ignorableTextTrue means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the hypertext links :

Some text before.
<a class="ignorableTextTrue" 
   href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
   This Web page provides additional information.
</a> 
 Some text after.   

After removing hypertext links the fragment should be:

Some text before. Some text after.

Links to other NewsML-G2 documents

The HTML can contain links to other NewsML-G2 documents managed by AFP. Such links are associated with a part of the textual content. We represent these links using the g2document microformat. It consists in a span element with a class attribute that contains "g2document". In addition, we provide another class name denoting the type of the referenced document: "g2picture", "g2video", etc. Finally, we may provide a class name that provides a hint on how a link could be removed gracefully. For example:

<span class="g2document g2text ignorableTextFalse">
    <a style="display: none" href="http://doc.afp.com/7W37U"></a>
    <a style="display: none" href="otherDocument.xml"></a>
    some text
</span>

The content of the span element is organized as follow:

The following table lists the class names used to specify the type of a referenced NewsML-G2 document. See the overview section for a presentation of the various document types.

Types of referenced NewsML-G2 document
Class name Type
g2text Text
g2multimedia Multimedia
g2picture Picture
g2graphic Still graphic
g2animated Animated graphic
g2video Video
g2liveReport Live report index
g2interactive Interactive graphic

The class attribute may also be used to specify "ignorableTextFalse" or "ignorableTextTrue". These class names are meant to assist you if you need to remove links from the HTML content (this is a common need for some of our clients).

ignorableTextFalse

ignorableTextFalse means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the links :

Pierre Bergé quoted  
<span class="g2document g2text ignorableTextFalse">
    <a style="display: none" href="http://doc.afp.com/7W37U"></a>
    <a style="display: none" href="otherDocument.xml"></a>
    a recent AFP news story
</span>
to illustrate...

After removing links the fragment should be:

Pierre Bergé quoted a recent AFP news story to illustrate...
ignorableTextTrue

ignorableTextTrue means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the links :

Some text before.
<span class="g2document g2text ignorableTextFalse">
    <a style="display: none" href="http://doc.afp.com/7W37U"></a>
    <a style="display: none" href="otherDocument.xml"></a>
    This AFP news story provides additional information.
</span>
Some text after.   

After removing links the fragment should be:

Some text before. Some text after.

Data specific to visual content

Some data is associated with visual content. It may be present in picture, video, still graphic and animated graphic documents. It may also be present in picture, video, still graphic and animated graphic items of multimedia documents. This section details these data elements.

Caption

Picture, video, still graphic, animated graphic documents: a caption may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="afpdescRole:contentDescription">
                    French businessman and head of Sidaction organisation Pierre Berge
                    attends at Marigny theater in Paris.
                </description>
                <description role="afpdescRole:contextDescription">
                    This is the first of the four auction days led by Christie's of 
                    Yves Saint-Laurent and Pierre Berge collection, which profit will 
                    fund campaigns against HIV-AIDS.
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a caption may be provided in the content metadata section of each news item conveying picture, video, still graphic or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- Caption for the content of this item -->
                <description role="afpdescRole:contentDescription">
                    French businessman and head of Sidaction organisation Pierre Berge
                    attends at Marigny theater in Paris. This is the first of the four auction days led by Christie's of 
                    Yves Saint-Laurent and Pierre Berge collection, which profit will 
                    fund campaigns against HIV-AIDS.
                </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- Caption for the content of this other item -->
                <description role="afpdescRole:contentDescription">
                    Christie's auctioneer François de Ricqles proceeds with the auction 
                    of a rabbit head, a Chinese imperial bronze on February 25, 2009 
                    at the Grand Palais in Paris. This object is part of a prized art collection assembled by 
                    Yves Saint Laurent and his partner Pierre Berge over half a 
                    century. One of the world's great private collections, it takes
                    in masterpieces by Picasso, Mondrian and Matisse, old masters, Art
                    Deco gems, bronzes, enamels and antiques. Two looted Chinese bronzes
                    sold for 15.7 million euros (20.3 million dollars) each to anonymous
                    telephone bidders at the Yves Saint Laurent art sale on Wednesday, 
                    despite protests from Beijing.
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

In picture, video, still graphic or animated graphic documents, the caption, if present, is provided in two parts. The content description is a concise textual descriptions of what is shown in the visual content. The context description provides background information (e.g., context, meaning, etc.) about what is shown.

The content description may be provided in the associated news item by a description element whose role attribute, the QCode afpdescRole:contentDescription, resolves to http://cv.afp.com/descriptionRoles/contentDescription. The context description may be provided by a description element whose role attribute, the QCode afpdescRole:contextDescription, resolves to http://cv.afp.com/descriptionRoles/contextDescription.

In Multimédia document, the captions of visual components are in one part, as shown in the example above.

There is no caption for text content. In picture, video, still graphic and animated graphic documents, there is a single news item, which, consequently, is the one that may provide a caption. For multimedia documents, the caption of each picture, video, still graphic and animated graphic may appear in each corresponding news item. There is at most one caption per news item.

Note that while NewsML-G2 allows for rich text by using some markup in the content of a caption, AFP's systems only output simple textual content not interspersed with markup.

From time to time the AFP NewsML-G2 format evolves, but you may still want to correctly process older documents that make use of previous versions of the format.

In older documents, captions are represented in a different way. In some documents the content description may be provided in the associated news item by a description element whose role attribute, the QCode afpdescRole:captionContentDescription, resolves to http://cv.afp.com/descriptionRoles/captionContentDescription. The context description may be provided by a description element whose role attribute, the QCode afpdescRole:captionContext, resolves to http://cv.afp.com/descriptionRoles/captionContext.

In even older documents, the content description and context description may not be provided as separate elements but instead in a single description element whose role attribute, the QCode drol:caption, resolves to http://cv.iptc.org/newscodes/descriptionrole/caption.

Copyright Notice

Picture, video, still graphic, animated graphic documents: a copyright notice may be provided in the rights information of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <rightsInfo>
                <copyrightNotice>Copyright AFP or licensors</copyrightNotice>
            </rightsInfo>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a copyright notice may be provided in the rights information of each news item conveying picture, video, still graphic or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <!-- A copyright notice for this item -->
            <rightsInfo>
                <copyrightNotice>Copyright AFP or licensors</copyrightNotice>
            </rightsInfo>
        </newsItem>
        <newsItem>
            <contentMeta>
            <!-- A copyright notice for this item -->
            <rightsInfo>
                <copyrightNotice>Copyright AFP or licensors</copyrightNotice>
            </rightsInfo>
        </newsItem>
    </itemSet>
</newsMessage>

Note that while NewsML-G2 allows for rich text by using some markup in the content of a copyright notice, AFP's systems only output simple textual content not interspersed with markup.

Visual content

Basic format

Picture, video, still graphic, animated graphic documents: one or multiple links to visual content may be provided in the content set of the news item.

<newsMessage>
    <itemSet>
        <!-- A visual item with three different renditions of the same visual content -->
        <newsItem>
            <contentSet>
                <remoteContent href="pictureItem/image1.jpg"/>
                <remoteContent href="pictureItem/image2.jpg"/>
                <remoteContent href="ftp://example.com/image3.gif"/>
            </contentSet>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: one or multiple links to visual content may be provided in the content set of each news item conveying picture, video, still graphic or animated graphic content.

<newsMessage>
    <itemSet>
        <!-- A visual item with three different renditions of the same visual content -->
        <newsItem>
            <contentSet>
                <remoteContent href="pictureItem/image1.jpg"/>
                <remoteContent href="pictureItem/image2.jpg"/>
                <remoteContent href="ftp://example.com/image3.gif"/>
            </contentSet>
        </newsItem>
        
        <!-- Another visual item with two rendition of some other visual content -->
        <newsItem>
            <contentSet>
                <remoteContent href="videoItem/video1.mp4"/>
                <remoteContent href="http://example.com/video2.mp4"/>
            </contentSet>
        </newsItem>        
    </itemSet>
</newsMessage>

Links to the actual visual content (e.g., bitmaps, vector graphics, video frames, etc.) are provided by href attributes of remoteContent elements. The value of each href attribute is an URI reference (while NewsML-G2 allows for IRI references, AFP NewsML-G2 documents use only URI references). See section "Accessing visual content through URI references" for additional directions on how to use these links.

Each picture, video, still graphic and animated graphic news item carries information one visual content (i.e., one picture, video or graphic). However, this content may be available in multiple renditions (e.g., low resolution, high resolution, JPEG format, TIFF format, etc.). Each rendition is described by a remoteContent element in the content set of the item.

In standard NewsML-G2 "Each rendition [in the content set of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format. [Renditions in the content set of a given news item are] different technical representations of the same logical content". AFP renditions for picture and graphic content do not always abide by this rule: in addition to providing different technical representations of the same logical content, our renditions may also consist in crops or other alterations of the content provided by other renditions of the same news item.

Additional properties of renditions

For each rendition, some information may be provided by attributes on remoteContent elements. These attributes are described below.

Rendition type

To aid selecting renditions, the type of a rendition may be provided by a rendition attribute in the remoteContent element describing the rendition, as in this example:

<!-- Three description of renditions of different types -->
<remoteContent rendition="rnd:lowRes"    href="pictureItem/image1.jpg"/>
<remoteContent rendition="rnd:highRes"   href="pictureItem/image2.jpg"/>
<remoteContent rendition="rnd:thumbnail" href="pictureItem/image3.gif"/>

At the time of writing, some remoteContent elements may be delivered with no rendition attribute. For instance, this is the case for renditions in postscript or pdf format for still graphics, but they will have a contenttype attribute identifying the format, as detailled in the section about rendition formats).

The rendition attribute provides a QCode whose possible values are taken from an IPTC controlled vocabulary and from AFP controlled vocabularies. The following tables provide examples of such values.

Examples of rendition types for picture documents
Concept URI QCode Description
http://cv.iptc.org/newscodes/rendition/highRes rnd:highRes High resolution image
http://cv.iptc.org/newscodes/rendition/preview rnd:preview Preview resolution image
http://cv.iptc.org/newscodes/rendition/thumbnail rnd:thumbnail A very small rendition of an image, giving only a general idea of its content

Examples of rendition types for still graphic documents
Concept URI QCode Description
http://cv.afp.com/renditions/AIcs11 afprnd:AIcs11 Rendition in Adobe Creative Suite 11 format
http://cv.iptc.org/newscodes/rendition/highRes rnd:highRes High resolution image
http://cv.afp.com/renditions/jpeg_retina afprnd:jpeg_retina A JPEG image in retina resolution. Typically, it contains four times more pixels than the jpeg_standard rendition.
http://cv.afp.com/renditions/jpeg_standard afprnd:jpeg_standard A JPEG image in standard resolution
http://cv.afp.com/renditions/png_retina afprnd:png_retina A PNG image in retina resolution. Typically, it contains four times more pixels than the png_standard rendition.
http://cv.afp.com/renditions/png_standard afprnd:png_standard A PNG image in standard resolution
http://cv.iptc.org/newscodes/rendition/preview rnd:preview Preview resolution image
http://cv.iptc.org/newscodes/rendition/thumbnail rnd:thumbnail A very small rendition of an image, giving only a general idea of its content

Examples of rendition types for visual components in multimedia documents
Concept URI QCode Description
http://cv.iptc.org/newscodes/rendition/fullSize afprnd:fullSize Documentation forthcoming
http://cv.afp.com/renditions/highDef afprnd:highDef Rendition of the highest definition of a visual component in a multimedia document
http://cv.afp.com/renditions/ipad afprnd:ipad Content intended to appear on iPad
http://cv.iptc.org/newscodes/rendition/mobile rnd:mobile Content intended to appear on a mobile or handheld device
http://cv.afp.com/renditions/squaredThumbnail afprnd:squaredThumbnail A small squared rendition of an image
http://cv.iptc.org/newscodes/rendition/thumbnail rnd:thumbnail A very small rendition of an image, giving only a general idea of its content
http://cv.iptc.org/newscodes/rendition/web rnd:web Content intended to appear on a web page

Examples of renditions types for interactive documents
Concept URI QCode Description
http://cv.afp.com/renditions/png_standard afprnd:interactive The interactive rendition

Media type and format

The media type of a rendition may be provided by a contenttype attribute on the remoteContent element describing the rendition, as in this example:

<!-- Three description of renditions, each one with a media type -->
<remoteContent contenttype="image/jpeg" href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif"  href="pictureItem/image3.gif"/>

The value of the contenttype attribute is a IANA MIME media type name [MediaTypes].

The contenttype attribute may be complemented by a format attribute to refine information about the data format of the rendition. For example:

<!-- Three descriptions of renditions, each one with a media type complemented by a format -->
<remoteContent contenttype="image/jpeg" format="example:JPEG_Baseline"    
               href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" format="example:JPEG_Progressive" 
               href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif"  format="example:GIF87a"
               href="pictureItem/image3.gif"/>
Visual dimensions

The width and height of a rendition may be provided by width and height attributes (whose values are non-negative integers) on the remoteContent element describing the rendition. The units in which these dimensions are expressed may be provided by widthunit and heightunit attributes. These attributes provide QCodes whose possible values are in the controlled vocabulary defined by IPTC for dimension units (cf. [IPTCDimUnits]). For example:

<remoteContent width ="640" widthunit ="dimensionunit:pixels" 
               height="400" heightunit="dimensionunit:pixels" href="pictureItem/image1.jpg"/>

This fragment states that the visual content at images/image1.jpg is 640 pixels width and 400 pixels height (in this example, we suppose that dimensionunit is a scheme alias for the controlled vocabulary defined by IPTC for dimension units).

The possible dimension units are a subset of the IPTC dimension units controlled vocabulary. They are provided in the table below, where the "Concept URI" column gives the URI to which the heightunit and/or widthunit attributes resolve.

Dimension units
Unit QCode Concept URI
Pixel dimensionunit:pixels http://cv.iptc.org/newscodes/dimensionunit/pixels
Typographic Point dimensionunit:points http://cv.iptc.org/newscodes/dimensionunit/points
Millimeter dimensionunit:mm http://cv.iptc.org/newscodes/dimensionunit/mm

If a width and/or a height attribute is present but the corresponding dimension unit attribute is missing, then you must assume that the width and/or height is expressed in the default unit for that dimension. The default dimension units, which are specified by NewsML-G2, are given in the table below.

Default dimension units
Type of visual content Default height unit Default width unit
Picture pixels pixels
Graphic (still or animated) points points
Digital video pixels pixels
Size

The size in bytes of a rendition may be provided by a size attribute on the remoteContent element describing the rendition, as in this example:

<remoteContent size="253476" href="pictureItem/image1.jpg"/>

In this example, the size attribute asserts that the representation of the resource identified by images/image1.jpg weight 253476 bytes.

The value of the size attribute is a non-negative integer.

Data specific to picture and still graphic content

Some data is only present in picture and still graphic documents, and in picture and still graphic items of multimedia documents. This section describes these data elements.

Note that picture and still graphic documents/items also contains data common to visual content (see section "Data specific to visual content") and, of course, data common to all kind of content (see section "Common data").

Additional data about visual content

As described in the section "Visual content", a given visual may have multiple renditions, each one described by a remoteContent element. This section describes additional data that may be used to describe a picture or still graphic rendition.

Orientation

The "orientation" of a rendition is an indication of orientation change from the original digital image. It may be provided by an orientation attribute on the remoteContent element describing the rendition. The value of this attribute is an integer in the range of 1 to 8 (inclusive). For example:

<remoteContent orientation="5" href="pictureItem/image1.jpg"/>

This fragment states that the image at pictureItem/image1.jpg has been flipped about the vertical axis and rotated 90 degrees counterclockwise with regard to the original image. See the NewsML-G2 specification for a comprehensive description of the meaning of each value.

If no orientation attribute is present, you should assume a value of 1, which means "upright, no flip, no rotation" (i.e., the visual top of the original image is at the top, the visual left side of the original image in on the left, etc.)

Illustration images (aka previews or thumbnails)

Small illustration images may be provided as part of the content set through remotecContent elements, just like other renditions. They are distinguished by the value of their rendition attribute; e.g., http://cv.iptc.org/newscodes/rendition/thumbnail, http://cv.afp.com/renditions/squaredThumbnail. See the section on visual content for detailed information.

Note that illustration images for video or animated graphics are provided through a different way, as described in the section on icons.

Data specific to video and animated graphic content

Some data is only present in video and animated graphic documents, and in video and and animated graphic items of multimedia documents. This section describes these data elements.

Note that video and animated graphic documents/items also contains data common to visual content (see section "Data specific to visual content") and, of course, data common to all kind of content (see section "Common data").

Additional data about visual content

As described in the section "Visual content", a given visual may have multiple renditions, each one described by a remoteContent element. This section describes additional data that may be used to describe a video and animated graphic rendition.

Duration

The duration of a rendition may be provided by a duration attribute (a non-negative integer) on the remoteContent element describing the rendition. The unit in which the duration is expressed may be provided by a durationunit attribute. This attribute provides a QCode whose possible values are in a subset of the controlled vocabulary for time units defined by IPTC [IPTCTimeUnits]. For example:

<remoteContent duration="120" durationunit="timeunit:seconds" 
               href="http://example.com/video2.mp4"/>

This fragment states that the content at http://example.com/video2.mp4 lasts 120 seconds (in this example, we suppose that timeunit is a scheme alias for the controlled vocabulary defined by IPTC for time units).

Possible time units are given in the table below, where the "Concept URI" column gives the concept URI to which the QCode provided by durationunit resolves.

Time units for video or animated graphic duration
Unit QCode Concept URI
Edit Unit timeunit:editUnit http://cv.iptc.org/newscodes/timeunit/editUnit
Second timeunit:seconds http://cv.iptc.org/newscodes/timeunit/seconds
Millisecond timeunit:milliseconds http://cv.iptc.org/newscodes/timeunit/milliseconds

If a duration attribute is present without a durationunit attribute, then you must assume that the duration is expressed in seconds.

Icon (aka illustration or preview image)

Basic format

Video and animated graphic documents: icon renditions may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <!-- A visual item with two icons -->
        <newsItem>
            <contentMeta>
                <icon href="http://example.com/img1.jpg"/>
                <icon href="icons/img2.tiff"/>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: icon renditions may be provided in the content meta of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <!-- A video or animated graphic item with two icon renditions -->
        <newsItem>
            <contentMeta>
                <icon href="http://example.com/img1.jpg"/>
                <icon href="icons/img2.tiff"/>
            </contentMeta>
        </newsItem>
        <!-- A video or animated graphic item with one icon rendition -->
        <newsItem>
            <contentMeta>
                <icon href="ftp://example.com/img3.jpg"/>
            </contentMeta>
        </newsItem>        
    </itemSet>
</newsMessage>

An icon is an image illustrating a video or an animated graphic (in NewsML-G2, an icon can also be associated with pictures or still graphics, but AFP documents do not use this feature). An icon is typically a keyframe of the visual content, but it can also be a logo or any other illustration.

Each video or animated graphic document, and each video or animated graphic item of a multimedia document may have at most one logical visual content as its icon. However, this content may be available in multiple renditions (e.g., low resolution, high resolution, JPEG format, TIFF format, etc.). Each rendition is described by an icon element in the content metadata section the news item.

Links to the actual icon renditions are provided by href attributes of icon elements. The value of each href attribute is an URI reference (while NewsML-G2 allows for IRI references, AFP systems only output URI references). See section "Accessing visual content through URI references" for additional directions on how to use these links.

In standard NewsML-G2 "Each [icon] rendition [in the content metadata section of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format". AFP icon renditions do not always abide by this rule: in addition to providing different technical representations of the same visual content, our icon renditions may also consist in crops or other alterations of the content provided by other icon renditions.

For each icon rendition, some information might be provided by attributes on icon elements. These attributes are described below.

Icon rendition type

To aid selecting icon renditions, the type of a rendition may be provided by a rendition attribute in the icon element describing the rendition, as in this example:

<!-- Two icon renditions of different types -->
<icon rendition="rnd:thumbnail"  href="icons/img1.jpg"/>
<icon rendition="afprnd:squaredThumbnail" href="icons/img2.tiff"/>

The rendition attribute provides a QCode whose possible values are taken from an IPTC controlled vocabulary and from AFP controlled vocabularies. Typical values are shown below.

Icon rendition types
QCode Concept URI Description
rnd:thumbnail http://cv.iptc.org/newscodes/rendition/thumbnail A very small rendition of an image, giving only a general idea of its content
afprnd:squaredThumbnail http://cv.afp.com/renditions/squaredThumbnail A small squared rendition of an image

Media type and format

The media type of an icon rendition may be provided by a contenttype attribute on the icon element describing the rendition, as in this example:

<!-- Two description of icon renditions of different types -->
<icon contenttype="image/jpeg" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" href="icons/img2.tiff"/>

The value of the contenttype attribute is a IANA MIME media type name [MediaTypes].

The contenttype attribute may be complemented by a format attribute to refine information about the data format of the icon rendition. For example:

<!-- Two descriptionss of icon renditions,
     each one with a media type complemented by a format -->
<icon contenttype="image/jpeg" format="example:JPEG_Baseline" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" format="example:NSK-TIFF"      href="icons/img2.tiff"/>

Visual dimensions

The width and height of an icon rendition may be provided by width and height attributes (whose values are non-negative integers) on the icon element describing the rendition. The units for these dimensions may be provided by widthunit and heightunit attributes. These attributes provide QCodes whose possible values are in a subset of the controlled vocabulary for dimension units defined by IPTC [IPTCDimUnits]. For example:

<icon width ="640" widthunit ="dimensionunit:pixels" 
      height="400" heightunit="dimensionunit:pixels" href="icons/img1.jpeg"/>

This fragment states that the visual content at icons/image1.tiff is 640 pixels width and 400 pixels height (in this example, we suppose that dimensionunit is a scheme alias for the controlled vocabulary defined by IPTC for dimension units).

The possible dimension units are a subset of the IPTC dimension units controlled vocabulary. They are provided in the table below, where the "Concept URI" column gives the URI to which the heightunit and/or widthunit attributes resolves. Currently, AFP always expresses icon dimensions in pixels.

Dimension units
Unit QCode Concept URI
Pixels dimensionunit:pixels http://cv.iptc.org/newscodes/dimensionunit/pixels

If a width and/or a height attribute is present but the corresponding dimension unit attribute is missing, then you can assume that the width and/or height is expressed in pixels.

Size

The size in bytes of an icon rendition may be provided by a size attribute on the icon element describing the rendition, as in this example:

<icon size="253476" href="icons/img1.jpeg"/>

In this example, the size attribute asserts that the representation of the resource identified by icons/image1.tiff weight 253476 bytes.

The value of the size attribute is a non-negative integer.

Script (aka verbatim or transcript)

Video and animated graphic documents: a script may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="afpdescRole:script">
                    A rare glimpse of the art behind the label. 
                    What Yves Saint Laurent earned in the fashion industry he spent on 
                    masterpieces. At Christie’s auction house in London, a treasure trove of
                    paintings, sculpture, furniture and jewellery amassed by the fashion 
                    icon and his lover and business partner Pierre Bergé -- over a 50 year 
                    partnership.

                    SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department, 
                    Christie’s Europe [English, 13 sec]:
                    "It's unprecedented - I mean we've never sold a collection in recent 
                    memory of that sort of outstanding quality throughout and I think it's
                    going to be most welcome by collectors who don't have that often a 
                    chance to acquire pieces of such quality"

                    Following the death of Yves Saint Laurent last year, Bergé chose to sell
                    the couple’s entire collection, which adorned their apartments in Paris.

                    For him, the sale is about finding some degree of closure: 

                    SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house 
                    [French, 16 sec]: "C’est le jour ou le dernier objet sera passé sous le 
                    marteau d'un commissaire priseur que à mon sens – a mon sens - cette 
                    collection pourra écrire le mot fin."

                    "Only on the day that the last piece goes under the hammer of an 
                    auctioneer – in my view – will the last word of this collection be 
                    written"

                    In spite of the global economic slowdown, Christie’s hopes the 
                    collection will fetch around 400 million dollars when it goes up for 
                    sale in Paris at the end of February.

                    A cubist-era Picasso – valued at 40 million dollars – and a rare 
                    selection of Mondrians are among the highlights. But for Yves Saint
                    Laurent and Pierre Bergé, it was not about the price tags – more the
                    enjoyment of living amongst beautiful art.

                    SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas 
                    [English, 19 sec]: "There was a great sense of everything being in the
                    right place - nothing dominating -and no trophies. I think it is a 
                    collection that's formed by two incredibly intelligent people working 
                    completely in concert with eachother - that's very unusual."

                    But it’s an unusual bond that is soon to be broken up amongst 
                    collectors, dealers and museums – the end of a long reign for 
                    the king of fashion.
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a script may be provided in the content metadata section of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A script for the content of this item -->
                <description role="afpdescRole:script">
                    A rare glimpse of the art behind the label. 
                    What Yves Saint Laurent earned in the fashion industry he spent on 
                    masterpieces.At Christie’s auction house in London, a treasure trove of
                    paintings, sculpture, furniture and jewellery amassed by the fashion 
                    icon and his lover and business partner Pierre Bergé -- over a 50 year 
                    partnership.

                    SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department, 
                    Christie’s Europe [English, 13 sec]:
                    "It's unprecedented - I mean we've never sold a collection in recent 
                    memory of that sort of outstanding quality throughout and I think it's
                    going to be most welcome by collectors who don't have that often a 
                    chance to acquire pieces of such quality"
                    ...
                    ...
                 </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- A script for the content of this item -->
                <description role="afpdescRole:script">
                    Hundreds of art buyers and lovers from around the world came for the
                    biggest private collection ever up for auction.
                    
                    SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
                    "I arrived two days ago to attend the sale." 

                    SOUNDBITE 2: Vox pop (man) (English, 4 sec)
                    "I came especially for the exhibition. Going back to New York very
                    shortly."
                    ...
                    ...
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A script, if present, provides the transcript of voices that can be heard in the video. This may include voices recorded when the video was shot as well as audio commentary written and voiced by a journalist which is added to the images and recounts the events of the story. It may also contains indications of significant sounds (e.g., "the sound of an explosion"). These elements are provided in their order of occurrence in the video or animated graphic.

A script is provided by a description element whose role attribute, the QCode afpdescRole:script, resolves to http://cv.afp.com/descriptionRoles/script. It may appear at most once per item.

Note that in some documents, the content of a description element whose role attribute resolves to http://cv.afp.com/descriptionRoles/script isn't a voice/sound transcript or isn't only a voice/sound transcript:

Shot lists have their dedicated slots in this XML format (see section "Shot list"), but in some documents they appear in the slots for scripts. For example, here is a description element that contains both a script an a shot list (we show only partial content):

<description role="afpdescRole:script">
    Script:
    Hundreds of art buyers and lovers from around the world came for the biggest 
    private collection ever up for auction.
    
    SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
    "I arrived two days ago to attend the sale."
    ...
    ...
    
    Shotlist: (shot Feb 23, 2009)
    -wide of auctioneer
    -painting on screen
    -Berge arriving at auction
    -SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
    -SOUNDBITE 2: Vox pop (man) (English, 4 sec)
    -close up of Matisse
    ...
    ...
</description>

Note that while NewsML-G2 allows for rich text by using some markup in the content of a script, AFP's systems only output simple textual content not interspersed with markup.

Shot list

Video and animated graphic documents: a shot list may be provided in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="afpdescRole:shotList">
                    -Member of Christie's staff walking in front of paintings
                    -Photographers
                    -Tilt of YSL poster
                    -VAR Christie's member of staff with metal art works
                    -VAR Theodore Gericault painting
                    -Thomas Seydoux, International Co-Head of Department, Christie’s Europe 
                    -PAN of photo of YSL's flat in Paris
                    -SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house
                    -Paintings on wall
                    -VAR Ferdinand Leger painting
                    -Picasso painting
                    -Woman looking at painting
                    -VAR Frans Hals portrait
                    -SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas 
                    -People walking through gallery
                    -Tilt to poster of YSL
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: a shot list may be provided in the content metadata section of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- A shot list for the content of this item -->
                <description role="afpdescRole:shotList">
                    -Member of Christie's staff walking in front of paintings
                    -Photographers
                    -Tilt of YSL poster
                    ...
                    ...
               </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- A shot list for the content of this item -->
                <description role="afpdescRole:shotList">
                    -wide of auctioneer
                    -painting on screen
                    -Berge arriving at auction
                    -SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
                    -SOUNDBITE 2: Vox pop (man) (English, 4 sec)
                    -close up of Matisse
                    ...
                    ...
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

A shot list, if present, provides a concise description of each sequence. These elements are provided in their order of occurrence in the video or animated graphic.

A shot list is provided by a description element whose role attribute, the QCode afpdescRole:shotList, resolves to http://cv.afp.com/descriptionRoles/shotList. It may appear there at most once per item.

In some documents, the shot list isn't provided in this way but appear concatenated to the script (see section "Script" for an example).

The exact format of a shot list may not be the same for all kind of documents and may also vary according to local journalistic practices.

Note that while NewsML-G2 allows for rich text by using some markup in the content of a shot list, AFP's systems only output simple textual content not interspersed with markup.

Speakers heard during audio or film recording (aka synthe)

Video and animated graphic documents: Speakers heard during audio or film recording may be described in the content metadata section of the news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <description role="afpdescRole:synthe">
                    -Thomas Seydoux (man), International Co-Head of Department,
                     Christie’s  Europe 
                    -Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
                    -Jonathan Rendell (man), Deputy Chairman, Christie’s Americas                
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Multimedia documents: Speakers heard during audio or film recording may be described in the content metadata section of each news item conveying video or animated graphic content.

<newsMessage>
    <itemSet>
        <newsItem>
            <contentMeta>
                <!-- Speakers heard during recording the content of this item -->
                <description role="afpdescRole:synthe">
                    -Thomas Seydoux (man), International Co-Head of Department,
                     Christie’s  Europe 
                    -Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
                    -Jonathan Rendell (man), Deputy Chairman, Christie’s Americas                
               </description>
            </contentMeta>
        </newsItem>
        <newsItem>
            <contentMeta>
                <!-- Speakers heard during recording the content of this item -->
                <description role="afpdescRole:synthe">
                    -Vox pop woman
                    -Vox pop man
                    -Pierre Berge (man), Yves Saint Laurent's partner
                </description>
            </contentMeta>
        </newsItem>
    </itemSet>
</newsMessage>

Specific information may be provided about speakers heard during audio or film recording where an important value of the clip consists of what is said. In most clips these speakers appear in the images, but that may not always be the case.

This information may be provided by a description element whose role attribute, the QCode afpdescRole:synthe, resolves to http://cv.afp.com/descriptionRoles/synthe. It may appear at most once per item. This information is provided in the order of occurrence of speakers in the video or animated graphic.

This information typically includes speakers' name and function. It can be used, for example, to add captions accompanying speakers' appearances in the video.

Note that while NewsML-G2 allows for rich text by using some markup in description elements, AFP's systems only output simple textual content not interspersed with markup.

Data specific to multimedia documents

Some data is specific to multimedia documents. This section details these data elements.

Number of non-main items by nature

Multimedia documents: the number of non-main items broken down by item natures may be provided in the item metadata section of the main news item.

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
                <afp:extension>
                    <afp:stats>
                        <afp:totalComponentsOfType qcode="ninat:graphic" total="1" />
                        <afp:totalComponentsOfType qcode="ninat:picture" total="3" />
                    </afp:stats>
                </afp:extension>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

As shown above each totalComponentsOfType element provides the number of non-main items of a given nature present in the document. The qcode attribute specifies the nature as described in the following table:

Natures of multimedia non-main items
Type QCode Concept URI
Picture ninat:picture http://cv.iptc.org/newscodes/ninature/picture
Video ninat:video http://cv.iptc.org/newscodes/ninature/video
Still graphic ninat:graphic http://cv.iptc.org/newscodes/ninature/graphic
Animated graphic ninat:animated http://cv.iptc.org/newscodes/ninature/animated

The total attribute provides the number of items of the given nature, as a strictly positive integer. If the stats element is present, the absence of a totalComponentsOfType element for a given nature means that no non-main item of that nature is present in the document.

The totalComponentsOfType elements appears inside a stats element inside an extension element in the item metadata section of the main news item. Note that the totalComponentsOfType, stats and extension elements are not standard NewsML-G2 vocabulary but part of an AFP's specific extension. They are defined in an XML namespace whose name is http://www.afp.com/format/internal/.

Therefore, here is how to interpret the example given at the beginning of this section:

The extension and stats elements are optional (i.e., they may or may not present). When they are present they appear at most once per document.

Multimedia content expressed using the XML syntax of HTML

Multimedia documents: the multimedia content is provided using the XML syntax of HTML in the content set of the main news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
            </itemMeta>
            <contentSet>
                <inlineXML contenttype="application/xhtml+xml">
                    <html xmlns="http://www.w3.org/1999/xhtml">
                        <head>
                            <title>
                                YSL-Bergé collection sets new world record at auction 
                                for a private collection
                            </title>
                        </head>
                        <body>
                            <p>
                                The Yves Saint Laurent and Pierre Bergé collection sets 
                                new world record at auction for a private collection. 
                                Hundreds of art treasures amassed by late fashion designer
                                Yves Saint Laurent and his companion Pierre Berge over half
                                a century are being auctioned.
                            </p>
                            <p>
                                <!-- Embedded content from a picture item -->
                                <span class="g2item g2picture">
                                    <a style="display: none" href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"></a>
                                    <img src="image1.jpeg" style="float: left;" 
                                         generator-unable-to-provide-required-alt="" height="163" width="245" />
                                </span>
                            </p>    
                            <p>
                                Bids hit 206 million euros (261 million dollars) on February
                                23, 2009 making it the biggest private collection ever 
                                auctioned with two days of sales still left to run.
                            </p>
                            <p>
                                <!--  Embedded content from a video item -->
                                <span class="g2item g2video">
                                    <a style="display: none" href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"></a>
                                    <video style="float: right;" controls="controls" height="138" width="245"
                                           poster="keyframe1.jpeg">
                                        <source src="video1.mp4" type="video/mp4" />
                                    </video>
                                </span>
                            </p>
                            <p>
                                <!-- An hypertext link to an external resource -->
                                The <a class="ignorableTextFalse" href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
                                wikipedia page about Yves Saint-Laurent</a> claims that ...
                            </p>
                        </body>
                    </html>
                </inlineXML>
            </contentSet>
        </newsItem>
        <newsItem guid="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2">
            ...
        </newsItem>
        <newsItem guid="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052">
            ...
        </newsItem>
    </itemSet>
</newsMessage>

The multimedia content expressed using the XML syntax of HTML is the main journalistic content of the document. It is provided by an inlineXML element. A contentType attribute with a value of application/xhtml+xml explicitly denotes the usage of the XML syntax of HTML.

The multimedia content contains the main textual content intermingled with links and audiovisual content. As shown in this figure, some parts of this content (e.g., pictures, videos, etc.) may be described by their own news items. These parts are referred to as "components". These news items describing them are themselves part of the NewsML-G2 document.

You can see in the example above that we use a microformat [Microformat] to denote a component and the reference to the news item that describes it. This allows to provide displayable information (e.g., an img tag) along with semantic markup (e.g., the reference to the news item) which can be machine-processed by your system.

This microformat consists in a span elements with a class attribute that contains "g2item". In addition, we provide another class name denoting the type of the referenced item (e.g., "g2picture", "g2video", etc.).

The first child element of such a span is always the reference to the news item that describe the component. It is represented as an a tag whose href attribute provides the GUID of the news item. This element is marked as non displayable as it is not meant to be directly displayed. Following this element, additional HTML markup defines embedded content for displaying a default rendition of this component. For example, a document may contains an img element displaying a picture.

This microformat is called the g2item microformat. Another microformat called the g2document microformat is used to represent links to other NewsML-G2 documents. In is described in its dedicated section below.

The following sections detail how various types of components and links are represented.

Picture

The class name "g2item" signals that we use the g2item microformat: the span represents a component along with a reference to the associated news item. The class name "g2picture" denotes that the referenced news item provides picture content. Inside the span, the first element provides the guid of that news item. The second element defines embedded content for displaying a default rendition of the picture, using a standard HTML img tag. For example:

<span class="g2item g2picture">
    <a style="display: none" href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"></a>
    <img src="image1.jpeg" style="float: left;" 
         generator-unable-to-provide-required-alt="" height="163" width="245" />
</span>

Still graphic

Embedded still graphic is defined like embedded picture except that in the span element we use the class name g2graphic instead of g2picture. For example:

<span class="g2item g2graphic">
    <a style="display: none" href="urn:newsml:afp.com:20100101:7a123456-a542-76fg-ab6a"></a>
    <img src="image1.jpeg" style="float: left;" 
         generator-unable-to-provide-required-alt="" height="163" width="245"/>
</span>

Video

For embedded video we also use the use g2item microformat. The class name g2video denotes that the referenced news item provides video content. Inside the span, the first element provides the guid of that news item. The embedded video is then defined using a standard HTML video tag. An illustration image may be provided by poster attribute, and additional attributes such as autoplay, loop, etc. may be used as well. For example:

<span class="g2item g2video">
    <a style="display: none" href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"></a>
    <video style="float: right;" controls="controls" height="138" width="245"
           poster="keyframe1.jpeg">
        <source src="video1.mp4" type="video/mp4" />
    </video>
</span>

Hypertext links to other resources

The HTML can contain hypertext links to other resources such as Web pages. They may be provided by a elements. For example here is a link to a wikipedia page:

<a class="ignorableTextFalse"
   href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)" >wikipedia page about Yves Saint-Laurent</a>

The class attribute, if present, may be used to specify either the class name "ignorableTextFalse" or "ignorableTextTrue". These class names are meant to assist you if you need to remove hypertext links from the HTML content (this is a common need for some of our clients).

ignorableTextFalse

ignorableTextFalse means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the hypertext links :

Pierre Bergé quoted the 
<a class="ignorableTextFalse" 
   href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>
to illustrate...

After removing hypertext links the fragment should be:

Pierre Bergé quoted the wikipedia page about Yves Saint-Laurent to illustrate...
ignorableTextTrue

ignorableTextTrue means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the hypertext links :

Some text before.
<a class="ignorableTextTrue" 
   href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
   This Web page provides additional information.
</a> 
 Some text after.   

After removing hypertext links the fragment should be:

Some text before. Some text after.

Links to other NewsML-G2 documents

The HTML can contain links to other NewsML-G2 documents managed by AFP. Such links are associated with a part of the textual content. We represent these links using the g2document microformat. It consists in a span element with a class attribute that contains "g2document". In addition, we provide another class name denoting the type of the referenced document: "g2picture", "g2video", etc. Finally, we may provide a class name that provides a hint on how a link could be removed gracefully. For example:

<span class="g2document g2text ignorableTextFalse">
    <a style="display: none" href="http://doc.afp.com/7W37U"></a>
    <a style="display: none" href="otherDocument.xml"></a>
    some text
</span>

The content of the span element is organized as follow:

The following table lists the class names used to specify the type of a referenced NewsML-G2 document. See the overview section for a presentation of the various document types.

Types of referenced NewsML-G2 document
Class name Type
g2text Text
g2multimedia Multimedia
g2picture Picture
g2graphic Still graphic
g2animated Animated graphic
g2video Video
g2liveReport Live report index
g2interactive Interactive graphic

The class attribute may also be used to specify "ignorableTextFalse" or "ignorableTextTrue". These class names are meant to assist you if you need to remove links from the HTML content (this is a common need for some of our clients).

ignorableTextFalse

ignorableTextFalse means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the links :

Pierre Bergé quoted  
<span class="g2document g2text ignorableTextFalse">
    <a style="display: none" href="http://doc.afp.com/7W37U"></a>
    <a style="display: none" href="otherDocument.xml"></a>
    a recent AFP news story
</span>
to illustrate...

After removing links the fragment should be:

Pierre Bergé quoted a recent AFP news story to illustrate...
ignorableTextTrue

ignorableTextTrue means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.

For example, suppose that the HTML contains the following fragment before removing the links :

Some text before.
<span class="g2document g2text ignorableTextFalse">
    <a style="display: none" href="http://doc.afp.com/7W37U"></a>
    <a style="display: none" href="otherDocument.xml"></a>
    This AFP news story provides additional information.
</span>
Some text after.   

After removing links the fragment should be:

Some text before. Some text after.

Data specific to Live report posts

Live report posts are represented by multimedia documents. They can contain additional dedicated metadata, as described in this section.

Live report intertitle

Live report posts: the indication that a post is an intertitle is provided in the item metadata section of the main news item.

<newsMessage>
    <itemSet>
        <newsItem>
            <itemMeta>
                <!-- This link element tells that this news item is the main item of the multimedia document  -->
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
                
                <!-- This link element tells that this multimedia document represents an intertitle in a live report  -->
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/liveReportIntertitle"/>
            </itemMeta>
        </newsItem>
     </itemSet>
</newsMessage>

While most posts carry a news bit about the ongoing event being reported, some differ as they represent intertitles. An intertitle typically provides some text describing a phase of the ongoing event, or another regroupment of a subset of posts. An intertitle is identified by the presence of a specific element in the item metadata section of its main item: a link element whose rel attribute convey the concept URI http://cv.iptc.org/newscodes/conceptrelation/isA (using the QCode crel:isA) and whose href attribute is the URI http://cv.afp.com/itemnatures/liveReportIntertitle.

Timestamp in Live Report

>Live report posts: the timestamp in live report is provided in the item metadata section of the main news item.

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
    <itemSet>
        <newsItem>
            <itemMeta>
                <link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
                <afp:extension>
                    <afp:timestampInLiveReport>
                        <afp:date>2016-07-09T15:30:33.928Z</afp:date>
                        <afp:label>15h30</afp:label>
                    </afp:timestampInLiveReport>
                </afp:extension>
            </itemMeta>
        </newsItem>
    </itemSet>
</newsMessage>

The timestamp in live report is provided for multimedia documents that represent posts in live reports. Each post is associated with a timestamp. This timestamp is provided by a timestampInLiveReport element in a extension element inside the item metadata section. It is made of :

These extension, timestampInLiveReport, date and label elements are in the XML namespace http://www.afp.com/format/internal/.

Data specific to live report indexes

Some data is specific to live report indexes. This section details these data elements.

Lead

Live report indexes: a lead of the live report may be provided in the content metadata section of the package item.

<newsMessage>
    <itemSet>
        <packageItem>
            <contentMeta>
                <description role="afpdescRole:lead">
                    <html:html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
                        <head />
                        <body>
                            <p>Live inside Christie's auction of Yves Saint-Laurent/bergé collection.</p>
                            <p>Auction sparks huge interest. Follow our report and analysis live.</p>
                        </body>
                    </html:html>
                </description>
            </contentMeta>
        </packageItem>
    </itemSet>
</newsMessage>

A "lead" for the live report may be provided by a description element whose a role attribute, the QCode afpdescRole:lead, resolves to http://cv.afp.com/descriptionRoles/lead. Inside this element the lead is provided using the XML syntax of HTML in an html element in namespace http://www.w3.org/1999/xhtml.

When present, the lead contains a short description (typically around one hundred words) of what the live report is about.

List of posts

Live report indexes: the list of posts of the live report is provided in the groupSet section of the package item.

<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
    <itemSet>
        <packageItem>
            <groupSet>
                <group role="afpgroup:elements">
                    <!-- An example of a live report index with three posts.
                    As a story develops, real live reports can include tens or hundred of posts. -->
                    <itemRef href="d-oc1ku.xml">
                        <afp:iteminfo>
                            <headline>Auction opens</headline>
                        </afp:iteminfo>
                    </itemRef>
                    <itemRef href="d-oc02w.xml">
                        <afp:iteminfo>
                            <headline>Christie's shows ten most intriguing pieces</headline>
                        </afp:iteminfo>
                    </itemRef>
                    <itemRef href="d-ob2p7.xml">
                        <afp:iteminfo>
                            <headline>Press conference scheduled at 7 PM</headline>
                        </afp:iteminfo>
                    </itemRef>                    
                </group>
            </groupSet>
        </packageItem>
    </itemSet>
</newsMessage>

The list of posts is provided as a list of links to the NewsML-G2 documents that represent individual posts. These links are provided inside the group set of the package item, in a group element whose role attribute, the QCode afpgroup:elements, resolves to http://cv.afp.com/grouproles/elements. Each link is provided by an itemRef element, through an href attribute (see the NewsML-G2 documentation [G2Doc] for more information about the itemRef construct).

Inside each itemRef, an itemInfo element in the XML namespace http://www.afp.com/format/internal/ may provide a title for the post in an headline element.

The list is chronologically ordered: the first itemRef links to the most recent post, the second itemRef links to the second most recent, etc.

Accessing visual content through URI references

In a document, a number of elements provide links to actual visual content in formats such as JPEG, MPEG-4, etc. Some of these elements are defined by NewsML-G2 while others are defined by HTML, as AFP text and multimedia documents can contain HTML (in XML syntax) embedded right into NewsML-G2. For example, such links can be provided by:

A link of this type is an URI reference as defined by [RFC3986]. This means it is either an URI or a relative-ref (colloquially referred as "relative URI").

At some point when dealing with a NewsML-G2 document, you'll typically want to retrieve the actual visual content, in order to process or display it.

If the link is a (non relative) URI per [RFC3986], you can directly dereference it, using standard software components, to retrieve the actual visual content. Typically, the scheme(s) used for such URI depend(s) on the specific delivery architecture established between you and AFP. Examples of commonly used schemes are: http, ftp and cid.

If the link is a relative-ref, then you need to resolve it to its target URI. You can then dereference the target URI to retrieve the actual visual content.

Note that with most standard libraries providing URI reference resolution, resolving a (non-relative) URI is the identity operation. That way, you don't have to determine whether you have been handed an (non-relative) URI or a relative-ref: you can just resolve the URI reference and then dereference it to retrieve the actual visual content.

Section 5 of [RFC3986] defines the process of resolving an URI reference. To carry on this process, you need the URI reference itself (as stated earlier, it is provided in the document, for example in an href attribute, src attribute, etc.) and a base URI. Typically the base URI is the URI that allows retrieving the NewsML-G2 document.

For example, if AFP delivers you a package that contains both an AFP NewsML-G2 document and data files for the associated visual content, the base URI is the URI that allows accessing the NewsML-G2 document after delivery. Suppose AFP delivers content in your file system in the directory "/deliverySpace/internet-journal/topnews/", producing the following file structure :

image/svg+xml 2016-04-01 09:45ZCanevas 8Sample delivery structure/deliverySpace/internet-journal/topnews/doc.afp.com-9719Z-2.xml5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg5ee2fa63ea5f30a817a2f420ec2187c8c15ad99f-fullSize.jpg27d7f92514d3bf5e3ee998cebf9c24dfb4c7ccc7-web.jpgf06f2bc314823c7c40f984224cb846d387b7137e-thumbnail.jpgV001_MMV851686_TFR.1920x1080.mp4V001_MMV851941_TFR.jpg

Sample delivery structure

In this context, the base URI is the URI that allows accessing the NewsML-G2 document after delivery. If your NewsML-G2 processor accesses the NewsML-G2 document at file:///deliverySpace/internet-journal/topnews/doc.afp.com-9719Z-2.xml, then this is the base URI. The URI references linking to the visual content can be resolved relatively to this base URI. For example, the URI reference 5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg would resolve to file:///deliverySpace/internet-journal/topnews/5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg, which can then be dereferenced to access that particular visual content.

Several libraries provide URI reference resolution. For instance, in Java, one could use the resolve() method of the java.net.URI class.

Release Notes

August 2021

The section Role in workflow has been enhanced to show that a flash can be followed by an urgent but not by an alert.

The section on caption has been thoroughly rewritten to explain that captions may be provided in two parts, the content description and the context description.

The new concept of renditions dedicated to cancelled documents has been documented in the section on publishing status.

The section on subjects has been completed to explain that some subjects are identified by an uri attribute. The section on locations that are subject matter of the document has been completed to show how a location can be specified using a geo URI.

In the section on locations from which the content originates, the entry about graphics has been corrected.

The section on mandatory processing has been enhanced.

The section on catchlines now states that a multimedia documents may provide a catchline identified by the role http://cv.afp.com/headlineroles/introduction.

The section on subtitles now states that subtitles are only provided for text and multimedia documents and that usually there is at most two subtitles.

The XML syntax for HTML was formerly referred to as "XHTML". As the latest versions of the HTML living standard no longer use that term, this document no longer use that term either.

This version also includes a number of editorial improvements.

July 2019

A section about mandatory processing has been added.

The sections about visual content rendition types and icon renditions types have been thoroughly updated.

A section about the copyright notice metadata has been added.

The section on content creation date now states that for photo combos, the content creation date we provide is the date of creation of the combo (instead of a shooting date).

A convergence effort between the metadata models of text and multimedia documents is underway in our production system. As a result the Related production and Role in workflow metadata may now be provided on multimedia documents. The documentation has been updated to reflect this change.

The section on publishing status, including information about cancelling documents, has been thoroughly rewritten to provide additional and more precise information.

Update about content warnings: our editorial system now makes use of the newly standardized content warning for "suffering". This documentation has been updated to reflect it.

The section about Visual Dimensions now states that the "millimeters" dimension unit may be used in AFP newsML-G2 documents.

"Related interactive graphic" has been added to the section about related production.

This version also includes a number of editorial improvements.

March 2018

Major update for multimedia documents, including initial documentation of our HTML microformats.

The documentation now states that a location of origin of content can be a "point of interest", in addition to already documented types (city, country area, country). See section Locations From Which The Content Originates.

The documentation provides a more accurate description of the "synthe" metadata, now stating that it concerns speakers heard during audio or film recording where an important value of the clip consists of what is said. In previous versions it was described as applying only to visible speakers. See section Speakers heard during audio or film recording (aka synthe).

Tables listing the main languages used in AFP production and their corresponding BCP 47 codes are now provided. See sections Language of the content and Language of metadata.

Various editorial improvements.

August 2016

The documentation has been updated thoroughly to allow processing AFP NewsML-G2 documents without resolving QCodes.

The documentation now states that along with event identifiers, the names of the events may be provided.

The documentation now states that posts in live report indexes are ordered chronologically (therefore it is no longer your responsibility to sort them).

The description of the "Timestamp in live report" metadata has been improved to include documentation for the label element.

The documentation of live reports now covers the notion of intertitle.

A number of improvements and clarifications have been made.

July 2016

The documentation for live reports has been added.

This document is now entirely self contained in one file, which makes it easier to distribute and use.

An important correction has been made: in previous versions of this documentation the concept URI for the "forbyline" role (cf. section on creators and contributors) was incorrectly specified as http://cv.afp.com/creatorroles/forbyline . This has been corrected; the correct concept URI is: http://cv.afp.com/contributorroles/forbyline.

A section on mentions of related production has been added.

An example has been added to the section on textual content of text document showing that the content can contain hypertext links.

A number of improvements and clarifications have been made.

February 2016

This documentation has been updated thoroughly for text documents.

February 2014

Documentation updated thoroughly in preparation of public delivery of NewsML-G2 documents.

January 2012

Initial version.

References

[G2Doc] "NewsML-G2 Documentation". IPTC. Available from https://iptc.org/standards/newsml-g2/using-newsml-g2/
[MediaTypes] MIME Media Types. Available at http://www.iana.org/assignments/media-types/index.html
[IPTCCPNatures] The IPTC controlled vocabulary for basic natures of concepts. Available at http://cv.iptc.org/newscodes/cpnature/
[IPTCDimUnits] The IPTC controlled vocabulary for dimension units. Available at http://cv.iptc.org/newscodes/dimensionunit/
[IPTCGenres] The IPTC controlled vocabulary for genres. Available at http://cv.iptc.org/newscodes/genre/
[IPTCLocTypes] The IPTC controlled vocabulary for location types. Available at http://cv.iptc.org/newscodes/location/
[IPTCMediaTopics] The IPTC controlled vocabulary for media topics. Available at http://cv.iptc.org/newscodes/mediatopic/
[IPTCNProviders] The IPTC controlled vocabulary for news providers. Available at http://cv.iptc.org/newscodes/newsprovider/
[IPTCTimeUnits] The IPTC controlled vocabulary for time units. Available at http://cv.iptc.org/newscodes/timeunit/
[IPTCCWarn] The IPTC controlled vocabulary for content warnings. Available at http://cv.iptc.org/newscodes/contentwarning/
[ISO3166] ISO 3166 Maintenance Agency. Available at http://www.iso.org/iso/country_codes.htm
[HTTPURI] "RFC 2616, section 3.2: Uniform Resource Identifiers". R. Fielding & al. June 1999. Available at http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2
[RFC3085bis] "URN Namespace for news-related resources". M. Steidl and J. Lorenzen. July 2009. Draft available at http://tools.ietf.org/html/draft-steidl-newsml-urn-rfc3085bis-00
[RFC3986] "Uniform Resource Identifier (URI): Generic Syntax". T. Berners-Lee, R. Fielding and L. Masinter. January 2005. Available at http://tools.ietf.org/html/rfc3986
[RFC3987] "Internationalized Resource Identifiers (IRIs)". M. Duerst and M. Suignard. January 2005. Available at http://www.ietf.org/rfc/rfc3987
[RFC5646] "Tags for Identifying Languages". A. Phillips and M. Davis. September 2009. Available at http://tools.ietf.org/html/rfc5646
[RFC5870] "A Uniform Resource Identifier for Geographic Locations ('geo' URI)". A. Mayrhofer and C. Spanring. June 2010. Available at http://tools.ietf.org/html/rfc5870
[TagCloud] Wikipedia article on tag cloud. Available at http://en.wikipedia.org/wiki/Tag_Cloud
[XMLSchemaDataTypes] XML Schema Part 2: Datatypes. Available at http://www.w3.org/TR/xmlschema-2/
[XMLSpec] "Extensible Markup Language (XML) 1.0". Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau. Available at http://www.w3.org/TR/xml/
[Microformat] Wikipedia article on microformats. Available at http://en.wikipedia.org/wiki/Microformat
[HTMPSpec] HTML Living Standard. Available at https://html.spec.whatwg.org

Prepared and written by Philippe Mougin

Copyright © 2012-2021 AFP