1. Introduction
AFP delivers information through various channels tailored to its clients’ needs. One such channel is NewsML‑G2, an industry‑driven format and processing model that enables a rich, machine‑readable representation of news content.
This document serves as a technical guide to AFP’s NewsML‑G2 documents. It is intended for use when implementing systems that receive and process AFP NewsML‑G2 content. Here, you will find explanations of how NewsML‑G2 building blocks are combined within AFP documents to convey news content and associated metadata—such as titles, genres, subjects, embargo information, and more. This guide is meant to be used alongside the IPTC’s official NewsML‑G2 documentation [G2Doc], with which it assumes familiarity.
AFP’s NewsML‑G2 documents build on the NewsML‑G2 format and processing model defined by the IPTC (International Press Telecommunications Council) as part of the News Architecture (NAR). NewsML‑G2 itself is an XML‑based standard and relies on XML Schema. AFP extends this by embedding textual content using the XML syntax of HTML [HTMLSpec] (formerly known as “XHTML 5”), allowing HTML structures to be seamlessly included within NewsML‑G2 documents. Handling AFP NewsML‑G2 content therefore requires working with all of these technologies.
Further sections provide an overview of AFP documents’ structure. They describe the information a document conveys and how to use it.
2. Mandatory processing
NewsML‑G2 documents carry a wide range of metadata. In many cases,
you may choose which metadata elements to use and which to ignore.
For instance, you may rely on IPTC Media Topics—or decide not to.
Some metadata fields, however, are mandatory and must always be processed,
such as embargo instructions.
In addition to your NewsML‑G2 integration, your workflows may also make use
of AFP metadata provided through other channels. For example, in video production
we also supply metadata in the form of a human‑readable dopesheet sent by email.
Regardless of how the metadata is delivered—whether through NewsML‑G2 or
another medium—it must be handled correctly.
The following metadata elements must be processed:
-
Identifier and version number for basic document management including updates and corrections.
-
The publishing status, which allows canceling or witholding documents.
-
The embargo instructions.
-
The "for byline" role when your system uses creator or contributor metadata.
-
The correction signal and/or the general editorial note in a way that ensures important corrections are processed correctly.
-
The general editorial note (again), which must be visible to your editorial team, as it may contain critical instructions. If your system cannot support an appropriate workflow for handling these notes, you may instead choose to discard the document and all associated content—including previous versions—whenever a general editorial note is present.
If you encounter any questions or difficulties while implementing these mandatory processes, please contact your AFP representative for assistance.
3. Undocumented features
In AFP-delivered NewsML-G2 documents, undocumented features might be found. Such features might be, for example, XML elements and attributes that are not documented here or in the official NewsML-G2 specification. You should not rely on them unless explicitly instructed by your AFP representative. Undocumented features are subject to change without notice and often contain information that cannot be interpreted reliably. For this reason, they should not be used as part of your implementation unless specifically advised.
4. Overview
An AFP NewsML‑G2 document provides structured metadata describing content published by AFP. For example, when AFP distributes a picture, it also supplies a corresponding NewsML‑G2 document containing metadata about that image—such as its caption, the photographer’s name, the location of the event, and more. Depending on the nature of the content, the NewsML‑G2 document may be delivered separately from the content itself (e.g., a JPEG photo accompanied by a NewsML‑G2 file) or may embed the content directly (e.g., a textual news story included within the NewsML‑G2 document).
AFP uses NewsML‑G2 for eight main categories of news content:
-
Text story
The content is a textual news article expressed using the XML
syntax of HTML (formerly “XHTML 5”). It may include structural markup
and hypertext links. The NewsML‑G2 document includes this textual
content along with metadata such as creation date, version number,
media topics, and content warnings. NewsML-G2 documents of this kind
are classified as type "text". See section
Document Walk-through for an example of such
document.
-
Picture
The content is a non‑animated visual depiction of a physical scene.
While typically a photograph, the content may also be an illustration—such as
a courtroom drawing—or a composite image (photo combo). Pictures are delivered
in formats such as JPEG, often at multiple resolutions. The accompanying NewsML‑G2
document provides metadata such as creation date, version number, content warnings,
and media topics. These documents are classified as type "picture".
-
Still graphic
A still graphic content is a non-animated graphic composition of elements such
as text, image, symbolic shapes, data (etc.) providing news coverage and/or
contextual information. For example, the content can be charts, diagrams or
maps and may also incorporate real photos: typically a photo overlaid with other
elements of the composition. Still graphic content can be delivered in various
formats including PDF, Adobe Illustrator Artwork, JPEG and PNG.
Vector-based formats support translation and customization,
while bitmap formats are provided at multiple resolutions. This visual content
associated NewsML‑G2 document provides metadata such as creation date, version
number, media topics, content warning, etc. Such NewsML-G2 documents are said
to be of type "still graphic".
-
Animated graphic
The content is an animated symbolic visual representation. It often
includes text labels. For example, it can be animated charts, diagrams,
or maps. It may include audio content. An animated graphic is
delivered in a ZIP package that contains, among other things, a Flash
media file. Along with this visual content, a NewsML-G2 document
provides metadata such as creation date, version number, media topics,
content warning, etc. Such NewsML-G2 documents are said to be of type
"animated graphic". Note that such content is legacy: no new content of
this kind is created by AFP. It has been replaced by videographics,
described below.
-
Interactive graphic
The content is an interactive graphic composition of various elements
such as text, images, symbolic shapes, data, etc. Interactive controls
allow the viewer to explore, drill down and filter information.
Technically an interactive graphic is implemented as an interactive HTML
document hosted by AFP and usually embedded into client applications via
an <iframe> tag. Along with this interactive content, a NewsML-G2
document provides metadata such as creation date, version number, media
topics, content warning, etc. Such NewsML-G2 documents are said to be of
type “interactive graphic”.
-
Video
AFP supplies NewsML-G2 metadata for two types of video content:
-
Moving visual representation of one or more physical scene(s), possibly accompanied by audio such as natural sound, commentary or music. This content is typically made from digital video material.
-
Animated graphic and audio composition of elements such as computer rendered 3D scenes, text, images, photos, videos, music and audio commentaries. It typically describes or explains an event, a phenomenon, a situation (historical, political, economic, etc.), a technique, a scientific or medical information. We call it videographic content.
These videos are delivered in several formats such as H.264, MPEG-2, Windows Media Video, Flash Video. Illustration images are also provided as JPEG data. The audio and, for videographics, textual content, can be delivered separately allowing translation and customization.
Along with this audiovisual content, a NewsML-G2 document provides metadata such as creation date, version number, media topics, content warning, shotlist, script, synthe, etc. Such NewsML-G2 documents are said to be of type "video".
In addition to this machine processable metadata, a textual summary of metadata, called the dopesheet, can be automatically sent to you by email.
-
-
Multimedia
The content is made of a textual news story intermingled with audiovisual components such as pictures, videos, graphics, etc. A NewsML-G2 document, said to be of type "multimedia", provides the multimedia content using the XML syntax of HTML. It also provides various metadata such as creation date, version number, media topics, content warning, etc. The visual components are delivered along this NewsML-G2 document in their respective formats (JPEG, PDF, MPEG, etc.)
-
Live report
A live report provides live coverage of an ongoing event, delivering news bits as the story develops. In the context of live reports, these news bits are called "posts". The posts are organized chronologically, as each one is tagged with a timestamp. They contain real-time coverage in text, photo, video, graphics, tweets, etc. including contributions from AFP journalists on the ground. Each post is represented by a NewsML-G2 multimedia document (see description above), which provides access to the news content and associated metadata, including the timestamp. Another NewsML-G2 document, the "index", provides metadata about the live report itself (title, media topics, etc.) and the collection of posts in the form of a list of links to the NewsML-G2 documents representing posts. This NewsML-G2 document is said to be of type "live report index", See the See the presentation of live reports top level structure for more information.
The type of a NewsML-G2 document defines important characteristics such as the nature of its content, its XML structure, the metadata it provides as well as some elements of its processing model. As you can see, the type of a NewsML-G2 document is named after the type of news content with which the NewsML-G2 document is associated.
AFP NewsML-G2 documents of type text, picture, video, still graphic, animated graphic and interactive graphic have the same top-level structure: a NewsML-G2 element called "news message". This news message is an envelope that contains one "news item". This news item represents some news content which can be either a news story in textual form, a photo, a video, a still graphic, an animated graphic or an interactive graphic.
AFP NewsML-G2 documents of type multimedia also have a news message as the top-level structure. This news message is an envelope that contains one or more news item(s): a main item with the multimedia content in the XML syntax of HTML and additional items for photos, videos, etc.
AFP NewsML-G2 documents of type live report index also have a news message as the top-level structure. This news message is an envelope that contains a "package item" providing metadata about the live report as a whole and links to NewsML-G2 documents representing the individual posts of the live report.
Section "Type of document" describes how to determine the type of a document. The following sections provide an overview of the structure of documents.
4.1. Text documents
Text documents have only one news item. This item contains metadata and textual news content. The content is represented by some HTML (in its XML syntax) embedded right into the news item.
4.2. Picture and still graphic documents
Picture and still graphic documents contain only one news item, which represents a single piece of visual content (e.g., one photo). However, this content may be available in multiple renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the picture or still graphic, the news item includes links to the actual visual content (e.g., JPEG resources) for each rendition. The visual content for each rendition is not included in the NewsML‑G2 document itself but is provided through external resources (e.g., accompanying files, web resources, etc.).
4.3. Video and animated graphic documents
Video and animated graphic documents have only one news item that represents a single logical visual content (e.g., one video, one animated graphic). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the video or animated graphic, the news item includes links to the actual visual content (e.g., MPEG files) for each rendition. The visual content for each rendition isn’t provided in the NewsML‑G2 document itself, but through external resources (e.g., accompanying files, web resources, etc.).
The news item may also contain links to renditions of an icon (also known as an “illustration” or “preview image”). The renditions of the icon aren’t provided in the NewsML‑G2 document itself, but through external resources (e.g., accompanying files, web resources, etc.).
4.4. Multimedia documents
Multimedia documents contain one or more news items. One of these items is the main news item. It is always present and provides the multimedia content using the XML syntax of HTML. It also provides metadata about the document, much like the news item in a text document, and includes links to the other items in the document. These additional items contain information about visual content such as pictures, videos, or graphics. They are similar in structure and purpose to the items found in picture, video, or graphic documents.
The figure below provides an example of a multimedia document with one main item, a picture item, and a video item.
The main news item is identified by the presence of a specific element
in its item metadata section: a link element whose rel attribute
conveys the concept URI
http://cv.iptc.org/newscodes/conceptrelation/isA
(using the QCode crel:isA) and whose href attribute, an URI, is
equal to http://cv.afp.com/itemnatures/mmdMainComp.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<!-- This link element indicates that this news item is the main item of the multimedia document -->
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
</newsItem>
<!-- Additional, non-main items -->
<newsItem></newsItem>
<newsItem></newsItem>
</itemSet>
</newsMessage>
More information about QCodes can be found in section Controlled vocabularies and qualified codes.
4.5. Live reports
A live report is represented by multiple NewsML‑G2 documents:
-
A document called index, which provides metadata about the live report and contains links to the various posts. The index is structured as a package item embedded in a news message. A package item is a standard NewsML‑G2 construct used to represent collections. See the section on list of posts for a detailed description of how the links to the posts are represented. See also the NewsML‑G2 specification [G2Doc] for a general description of package items.
-
A set of multimedia documents, each one representing a post of the live report.
The figure below shows the top‑level structure of a live report. You can see the index on the left and the various posts on the right.
4.6. Document walk-through
Below is an example of a simple text document, containing only a few metadata elements and some textual content. Using this example, we will walk through some structural elements that are common to all types of AFP NewsML‑G2 documents.
Note that while the XML in this example is formatted for readability, the actual documents you receive will usually be in a compact form (e.g., all XML on a single line).
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<header>
<sent>2009-02-23T20:44:07+02:00</sent>
</header>
<itemSet>
<newsItem standard="NewsML-G2" standardversion="2.28" conformance="power"
guid="http://doc.afp.com/863OC" version="3" xml:lang="en">
<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml"/>
<catalogRef href="http://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2-V2_4.xml"/>
<itemMeta>
<itemClass qcode="ninat:text"/>
<provider qcode="nprov:AFP">
<name>AFP </name>
</provider>
<versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
<pubStatus qcode="stat:usable"/>
</itemMeta>
<contentMeta>
<headline>
YSL-Bergé collection sets new world record at auction
for a private collection
</headline>
<subject qcode="medtop:20000031" type="cpnat:abstract">
<name>visual art</name>
</subject>
<subject qcode="medtop:20000011" type="cpnat:abstract">
<name>fashion</name>
</subject>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+xml" wordcount="70">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
YSL-Bergé collection sets new world record at auction
for a private collection
</title>
</head>
<body>
<p>The Yves Saint Laurent and Pierre Bergé collection sets
new world record at auction for a private collection.
Hundreds of art treasures amassed by late fashion designer
Yves Saint Laurent and his companion Pierre Berge over half
a century are being auctioned.</p>
<p>Bids hit 206 million euros (261 million dollars) on February
23, 2009 making it the biggest private collection ever
auctioned with two days of sales still left to run.</p>
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
Some notes about this structure:
-
News message
The
newsMessageelement conveys the document. It includes attributes providing namespace declarations and other information such as the schema location. This information is automatically interpreted by standard XML document processors (e.g., parsers, validators, etc.).The
newsMessageelement has two children: a header, which provides a transmission date (and possibly some additional information), and an item set which, in this example, contains one news item. In a multimedia document, the item set typically contains multiple items. -
News item
The
newsItemelement provides the journalistic content along with metadata about this content and other information useful for processing. It has attributes indicating the name, version and conformance level of the NAR standard used in the item. AFP NewsML-G2 documents use NewsML-G2 version 2.10 or higher at the power conformance level.The
guidattribute is a persistent and globally unique identifier for this news item, in the form of an IRI [RFC3987].The
versionattribute, if present, provides the version number of the item. It is incremented (not necessarily by one) when the document is updated. -
Catalog information
The news item then carries catalog information using
catalogRefandcatalogelements (only the former is shown in the example above). This information specifies mappings between scheme aliases and scheme URIs. It allows you to resolve qualified codes found, for instance, inqcodeattributes elsewhere in the item, to full URIs (i.e., unambiguous identifiers). In the example above, we reference a standard IPTC-provided catalog athttp://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xmland an AFP specific catalog athttp://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2-V2_4.xml. In actual documents you may find references to other catalogs. The section Controlled vocabularies and qualified codes provides more information about qualified codes resolution. -
Item meta data
The
itemMetaelement contains information about the news item itself. It always specifies the class of the item (e.g., text, picture, video, etc.), the provider of the item (that will be AFP or a specific AFP service), and the date of creation of this version of the item. It may also contain additional information, such as an embargo directive, a publication status, editorial notes, etc. -
Content meta data
The
contentMetaelement contains information about the journalistic content of the item (e.g., title, subjects, genres, language, etc.) -
Content set
The
contentSetelement contains the primary journalistic content of the item.In text documents this content is provided inline, in an
inlineXMLelement as shown in this example. See the section "Data specific to text documents" for more information.In picture, video, still graphic and animated graphic documents, this content is provided by reference: the content set contains links to the actual visual content (e.g., JPEG files, MPEG files, etc.). See the section "Visual content" for more information.
In multimedia documents, the content set of the main item contains the multimedia content of the document expressed using the XML syntax of HTML. Within this content, picture, video, still graphic, and animated graphic elements are included by reference through links to other news items. In addition, default renditions are provided using standard HTML elements such as
<img>or<video>. See the section "Data specific to multimedia documents" for more information.
4.7. Controlled vocabularies and qualified codes
4.7.1. Concepts are identified with concept URIs
Items can make use of several controlled vocabularies to convey information, including IPTC NewsCodes, Wikidata, Geonames, and others. A controlled vocabulary is a managed collection of unambiguous and stable identifiers for concepts. In accordance with Semantic Web principles, these identifiers take the form of URIs [RFC3986].
For example, an item can be either usable or canceled; this is referred to as the item publishing status.
-
The status usable is identified by:
http://cv.iptc.org/newscodes/pubstatusg2/usable -
The status canceled is identified by:
http://cv.iptc.org/newscodes/pubstatusg2/canceled
These identifiers are called concept URIs. Together, they form a controlled vocabulary.
All concept URIs in this vocabulary share the same prefix:
http://cv.iptc.org/newscodes/pubstatusg2/
This prefix is known as a scheme URI, and it identifies the controlled vocabulary itself.
Note that although concept URIs look like dereferenceable HTTP URLs, they do not need to be. Their primary purpose is to provide unambiguous identifiers for concepts. An item may contain a pubStatus element that includes the concept URI representing its publishing status. This allows one to compare the value of this concept URI with the possible values listed above to determine whether the document is usable.
4.7.2. Concepts URIs might be represented by QCodes
In NewsML-G2 documents, some concepts are not expressed directly as concept URIs using the full URI syntax. Instead, they are conveyed as QCodes (short for "Qualified Codes"). A QCode consists of two parts separated by a colon. The leftmost part (before the colon) is called the scheme alias, and the part to the right of the colon is called the code.
In some ways, a QCode can be seen as a compressed form of a concept URI. In practice, it is slightly more than a simple abbreviation, as it also identifies the controlled vocabulary to which the concept URI belongs. Determining the concept URI that a QCode represents is called resolving the QCode. The procedure for resolving QCodes will be described at the end of this section.
4.7.3. Why QCodes resolution to concept URIs is useful
When processing NewsML-G2 documents, it is useful to resolve QCodes to concept URIs and then work in terms of concept URIs. This is because QCodes are not universally unambiguous identifiers whereas concept URIs are.
For example, in one document the publishing status usable may be
expressed by the QCode: stat:usable (see it in situ in
section Document walk-through). In
another document the same status might instead be expressed by the QCode
pst:usable. Although these two QCodes differ, they both resolve to the same
concept URI:
http://cv.iptc.org/newscodes/pubstatusg2/usable.
Furthermore, although this does not happen in AFP production, it is
possible in general NewsML-G2 document for QCode stat:usable to
denote the publishing status "usable" in one document while representing
something entirely different in another. In such cases, the resolution process
will correctly yield
http://cv.iptc.org/newscodes/pubstatusg2/usable
in the context of the first document and a different concept URI in the
context of the second.
| Important design principle: QCode resolution shields you from variations at the QCode level or from accidental homonymies and provides you with unambiguous identifiers to work with. |
4.7.4. How to proceed when QCode resolution is not supported by your tool chain
Depending on your tool chain, QCode resolution may be difficult to implement.
For example, standard XML tools such as XPath processors cannot easily integrate
QCode resolution. If you are in such a situation, you can bypass the QCode
resolution step and work directly with QCodes when processing AFP’s production,
because we ensure that in our NewsML‑G2 documents QCodes are unambiguous.
For instance, in all AFP documents the QCode stat:usable always represents the
publishing status usable.
In this documentation, we specify both concept URIs and QCodes wherever needed.
Unless stated otherwise, for IPTC‑standardized NewsML‑G2 schemes we use the
IPTC‑recommended QCodes, which you can look up in the corresponding IPTC documentation.
For example, if you navigate with your Web browser to the resource identified by
the concept URI for the publishing status "usable" (click this link:
http://cv.iptc.org/newscodes/pubstatusg2/usable),
you will see that the IPTC‑recommended QCode for this publishing status is stat:usable.
However, when possible, it is advisable to resolve QCodes. This provides the following benefits:
-
Interoperability with other providers: Your system will more easily process NewsML‑G2 documents from other providers (e.g., Reuters, AP, etc.), who may use their own sets of QCodes to represent the same concept URIs.
-
Compatibility with tools and services based on concept URIs: Some tools, APIs, or services you may wish to use operate exclusively in terms of concept URIs. You will not be able to interoperate with such tools if you work at the QCode level. For example, the IPTC exposes standard NewsML‑G2 controlled vocabularies via Web resources whose URLs are the concept URIs of the vocabulary elements. If you know the concept URI of an element, you can fetch information about it from the IPTC servers. If you only have the QCode and cannot resolve it to the corresponding concept URI, you will be unable to retrieve this information.
4.7.5. How QCode resolution is performed
The resolution process is described precisely in the NewsML-G2
documentation
([G2Doc]).
In short, it consists of resolving the scheme-alias part of the QCode
to a scheme URI using the catalog information provided in the document
at the item level, and then concatenating that scheme URI with the code
part of the QCode. In our example, the QCode stat:usable has a scheme
alias stat and a code usable. It is resolved to
http://cv.iptc.org/newscodes/pubstatusg2/usable,
because the catalog information of the enclosing news item contains the
following element:
<scheme alias="stat" uri="http://cv.iptc.org/newscodes/pubstatusg2/"/>
This catalog information may appear inline in the item, inside catalog
elements, or in an external resource referenced by the item through a
catalogRef element, as in the following example borrowed from the
section Document walk-through:
<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml"/>
Resolving a QCode produces a concept URI that unambiguously identifies a
given concept on a global scale. In our example, the concept identified by
http://cv.iptc.org/newscodes/pubstatusg2/usable
is: the publishing status "usable".
Within the context of NewsML-G2 schemes, two logically different concepts are never assigned the same concept URI, even when the documents originate from different systems managed by different organizations.
An editorial system whose internal model is based on concept URIs performs the inverse operation when generating NewsML‑G2 items: it derives QCodes from concept URIs.
4.8. How to read the examples
The following sections of this document are dedicated to answering questions of the form "Where is data X in an AFP NewsML-G2 document (and how can I make use of it)?". For example: "Where is the title of the document?", "Where is the textual content?", "Where is the caption?", "Where is the visual content?" and so on.
For each type of data, XML examples are provided. These examples are not complete documents: they are high-level representations of the format, omitting many aspects and focusing only on the data in question.
For instance, here is the example we provide for the "word count" metadata in text documents (the word count gives an estimate of the size of the textual content):
<newsMessage>
<itemSet>
<newsItem>
<contentSet>
<inlineXML wordcount="450">
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
As you can see, this example omits many elements. You can compare it with the example of a complete document provided in the Document walk-through. What this simplified example does provide, however, is a clear indication of where the word‑count information is located and what it looks like. Some examples contain XML comments. For example:
<!-- A subject represented by a QCode -->
<subject qcode="medtop:20000273"/>
These comments won’t appear in real documents, they are annotations specific to this documentation.
5. Common data
Some data elements are common across most document types. For example, a creation date or a content warning may appear in any document—whether it is text, a picture, a still graphic, an animated graphic, a video, multimedia, or a live report. This section describes these common data elements. The following sections explain the data associated with specific document types.
5.1. Creators & Contributors
Text, picture, still graphic and video documents: creators and contributors may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<creator role="afpcrrol:writer afpctrol:forbyline">
<name>
John Doe
</name>
</creator>
<contributor role="afpctrol:editor afpctrol:validator">
<name>
Jeanne Dupont
</name>
</contributor>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: creators and contributors for the multimedia document as a whole may be provided in the content metadata section of the main news item. Creators and contributors specific to an individual item may be provided in the content‑metadata section of that item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- The creators and contributors to the multimedia document as a whole -->
<creator role="afpcrrol:writer afpctrol:forbyline">
<name>
John Doe
</name>
</creator>
<contributor role="afpctrol:forbyline">
<name>
Jeanne Dupont
</name>
</contributor>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- The creators and contributors specific to this item -->
<creator role="afpcrrol:photographer afpctrol:forbyline">
<name>
Al Dente
</name>
</creator>
<contributor>
<name>
Annie Mall
</name>
</contributor>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: creators may be specified in the content-metadata section of the package item. Note that contributors are not included in live-report indexes.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<creator role="afpcrrol:writer afpctrol:forbyline">
<name>
John Doe
</name>
</creator>
<creator role="afpcrrol:writer">
<name>
Walter Melon
</name>
</creator>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Creators and contributors may be provided through the creator and contributor elements. Creators are persons who produced the document or parts of it. Contributors are persons who modified or enhanced the document or parts of it. A news item may include any number of creators and contributors.
For each creator or contributor, a name is provided in the name element,
and optionally a list of roles expressed as a QCode list—in the role
attribute. The table below presents some of the roles frequently used in
AFP documents.
Creator and contributor roles |
||
Role |
QCode |
Concept URI |
Writer |
|
|
Photographer |
|
|
Graphic designer |
|
|
For byline |
|
|
| The "for byline" role has a special meaning: the names of creators and contributors without this role must not be published. These names may be used for internal purposes, such as contacting the journalist for questions, but they must not be displayed publicly in association with the document’s content. |
5.2. Content warning
Text, picture, still graphic, video and multimedia documents: A content warning may be provided in the item metadata section of the news item (for multimedia documents, this applies to the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: a content warning may be provided in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
A document may include a warning about its content when it might be perceived
as offensive. In such cases, you should review the document’s content to decide
how to use it. This warning is expressed through by a signal element with a
QCode sig:cwarn, which resolves to
http://cv.iptc.org/newscodes/signal/cwarn.
When a content warning is present, we often provide a set of
exclAudience elements indicating the reason(s) for the warning.
For example, in a document the content of which contains
potentially offensive violence and language:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
<contentMeta>
<exclAudience qcode="cwarn:violence"/>
<exclAudience qcode="cwarn:language"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
In a live report index, the exclAudience elements are provided in
the package item rather than in a news item:
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
<contentMeta>
<exclAudience qcode="cwarn:violence"/>
<exclAudience qcode="cwarn:language"/>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Used in this way, each exclAudience element identifies an audience
that may be offended or distressed by a specific characteristic of the
content (e.g., “violence”). To specify these characteristics, the IPTC’s
Content Warnings vocabulary
[IPTCCWarn] must
be used.
At the time of writing, AFP uses of the following content warnings, from the standard IPTC scheme: death, language, nudity, sexuality, violence and suffering.
5.3. Correction signal
Text, picture, still graphic, video and multimedia documents: a correction signal may be specified in the item metadata section of the news item (for multimedia documents, this applies to the main news item)
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<signal qcode="sig:correction"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
One particular type of update that can occur to a document is a
correction. A correction is issued when an error has been identified
in a document and a corrected version is published. In such cases, you receive
a new version of the document (i.e., a document with the same
guid but a new version number) which
includes a correction signal. This signal is represented by a signal
element with a qcode attribute sig:correction which resolves to
http://cv.iptc.org/newscodes/signal/correction.
At AFP, this mechanism is used only for corrections of significant importance. For example, fixing a typo that does not alter the meaning of the news story should not be marked as a correction; it may simply be released ats a standard update.
When a serious error affects a key piece of information and makes the document unusable, the document is typically canceled rather than corrected. A document is canceled by issuing a version with the "canceled" publishing status, as discussed in section Publishing Status.
The correction signal itself does not provide details about the nature of
the correction (e.g., what the error was, where it occurred, or how it was
fixed). Such information is usually included in the general editorial note
provided by an edNote element with a role attribute afpnoteRole:client
which resolves to http://cv.afp.com/ednoteroles/client
(see the section on the general editorial
note). For example:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<edNote role="afpnoteRole:client">
CORRECTS the first sentence of the answer of the auctioneer, which was incorrectly translated.
</edNote>
<signal qcode="sig:correction"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Handling a correction properly is critically important and can be a complex process (you likely already have procedures in place). For example, you may wish to have someone review the item—along with its previous versions and the accompanying editorial note—to fully understand the nature of the error. You may then ensure that the correction is applied to any published material that contains the original mistake. This may include notifying recipients of the earlier content and providing them with the corrected information.
5.4. Dates
Two dates formats are used in this specification:
-
Full date and time. This format follows the XML Schema dateTime datatype as defined in [XMLSchemaDataTypes]. It includes the additional requirement that the time zone must be specified.
-
Truncated date and time. This format is based on the XML Schema dateTime datatype but allows truncated forms (e.g. "2014-08"). The NewsML-G2 specification defines it as follows:
The date has an optional time part: it is optionally possible to omit one to many less significant components, from right to left. “From right to left” means starting from the least significant component (i.e., fraction of a second) and to continue with the full time part, the day part and the month part. The year part MUST NOT be omitted. If the time part is present the time zone SHOULD NOT be omitted.
In addition to the description provided below, consult the NewsML-G2 specification for details on the processing model for these date formats.
5.4.1. Document transmission date
All documents: the transmission date of the document is provided in the header of the news message.
<newsMessage>
<header>
<sent>2009-02-23T20:44:07+02:00</sent>
</header>
</newsMessage>
The transmission date is carried by the sent element. It is always
present and uses the full date and time format.
This value indicates when the document was transmitted from
AFP to your system.
5.4.2. Document creation date
Text, picture, still graphic, video and multimedia documents: The creation date of the NewsML-G2 document may appear in the item metadata section of the news item (for multimedia documents, in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the creation date of the NewsML-G2 document may appear in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
If present, the creation date of the NewsML-G2 document is supplied by a
firstCreated element, using the full date and time
format. This creation date indicates when the NewsML-G2 document itself was
created, as distinct from the content creation date, which indicates
when some content was created (e.g., when a given photo was shot).
When a new version of the document is issued, the document creation date does
not change, but the version creation date does.
5.4.3. Document version creation date
Text, picture, still graphic, video and multimedia documents: The creation date of the specific version of the NewsML-G2 document is provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the version creation date of the NewsML-G2 document is provided in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
The creation date of the current version of the NewsML-G2 document is carried
by the versionCreated element. This value is always present and always
expressed in full date and time format
5.4.4. Content creation date
The content creation date represents the date when the main journalistic content associated with the NewsML-G2 document was produced.
-
For a photo, this is the date of shooting
-
For a photo combo, it is when the combo was assembled
-
For live video footage, it is when the covered event occurred
-
For video reports , graphics and other type of content, it is when the content was produced.
5.4.4.1. Picture, still graphic, animated graphic and video documents
The content creation date may be provided by a contentCreated element
in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
5.4.4.2. Multimedia documents
The creation date of a specific picture, still graphic or video component may be provided in the content metadata section of the corresponding item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- This is the content creation date for this item -->
<contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This is the content creation date for this other item -->
<contentCreated>2009-02-22</contentCreated>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
While content creation dates may be provided for individual components, none is provided for the multimedia document as a whole. The version creation date of the document often provides good approximation. However, this may not be accurate for all documents; so you should apply this heuristic only if your usage can tolerate a "right most of the time" approach.
5.4.4.3. Text documents
As with multimedia documents, no content creation date is provided. The version creation date of the document often serves as good approximation. However, this may not be accurate for all documents; so you should apply this heuristic approach only if your usage can accommodate a "right most of the time" situation.
5.4.4.4. Live report indexes
No content creation date is provided for live report indexes.
5.5. Embargo
Text, picture, still graphic, video and multimedia documents: Embargo information is provided in the item metadata section of the news item (for multimedia documents, this applies to the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<embargoed/>
<edNote role="afpnoteRole:embargo">
Embargoed until end of first auction day
</edNote>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Embargo information is provided using the embargoed element, which
can be complemented by an edNote element with a role attribute
afpnoteRole:embargo resolving to
http://cv.afp.com/ednoteroles/embargo.
AFP documents may have oen of the three embargo statuses described in the table below.
Embargo statuses |
||
Embargoed |
Representation |
Example |
No |
No |
N/A |
Until given date and time |
An |
|
Under other provided conditions |
An empty |
|
Refer to the NewsML-G2 specification for more details on the representation and processing model of embargo information.
For multimedia documents, the way embargo information
is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries
its own embargo information, and a G2 item without an embargoed element is
considered not embargoed. In AFP’s multimedia documents, only
embargoed element of the main item must be considered. The
embargoed elements of non-main items must be ignored. You must process
multimedia documents so that embargo directives provided in the main news item
apply to the entire content of the document (i.e., to all items within the document)..
|
5.6. Event identifiers
Text, picture, still graphic, video and multimedia
documents: Multiple event identifiers may be provided using
subject elements in the content metadata section of the news
item (for multimedia documents, this applies to the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<subject qcode="QCode identifying an event" type="cpnat:event">
<name>
Auction for the Yves Saint Laurent and Pierre Bergé collection
</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: only one event identifier may be
provided using a subject element in the content metadata section
of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<subject qcode="QCode identifying an event" type="cpnat:event">
<name>
Auction for the Yves Saint Laurent and Pierre Bergé collection
</name>
</subject>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
News coverage of an event often spans multiple NewsML-G2 documents. For example, coverage of the auction for the Yves Saint Laurent and Pierre Bergé collection may be include :
-
covered by two news stories (one announcing the event and one reporting on it later)
-
two interview transcripts (one with Pierre Bergé and one with a Christie’s representative)
-
a multimedia document
-
a video report
-
several pictures of the event. It is often useful to know that these documents relate to the same event - for example to help editorial team access all related available documents or to automatically link related content on a news website.
To support this, AFP assigns a unique event identifier to each event and inserts this identifier into every related NewsML-G2 document. For example, an unique event identifier is assigned to the auction for the Yves Saint Laurent and Pierre Bergé collection, and each related document contains this identifier.
An event identifier is the concept URI of a subject element whose
type attribute, the QCode cpnat:event, resolving to
http://cv.iptc.org/newscodes/cpnature/event.
The identifier is conveyed through the qcode attribute.
In addition to event identifiers, AFP also provides event names when
possible. An event name is a short natural-language description of the
event supplied in a name element inside the subject element.
Refer to the section on subjects for more information about
the subject element.
If a document covers multiple events it may contain multiple event identifiers, as shown below:
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<subject qcode="QCode identifying an event" type="cpnat:event">
<name>Name of this event</name>
</subject>
<subject qcode="QCode identifying another event" type="cpnat:event">
<name>Name of this other event</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
|
Why event identifiers are provided using
<subject> elementsBecause events covered by a document are also part of its
subject matter—i.e, the things the document is about. Using NewsML-G2
|
5.7. General editorial note
Text, picture, still graphic, video and multimedia documents: a general editorial note may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<edNote role="afpnoteRole:client">
Original source is unknown and unverified. This photo was posted on twitter.
Following an official ban in San Theodoros on foreign media outlets covering
demonstrations, AFP is using pictures from other sources.
</edNote>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The general editorial note provides natural language text addressed to the editorial staff who receive and process the item. It may provide instructions or hints on handling the document, information about the nature of a correction (see the example in the section on correction signal), excluded audience/usage, additional details about the content, etc. It is not intended for publication.
There is at most one general editorial note in a document. If present,
it appears in an edNote element whose role attribute, the QCode
afpnoteRole:client, resolves to
http://cv.afp.com/ednoteroles/client. Although NewsML-G2 allows
rich text by using some markup in the content of an editorial note,
AFP’s systems only output simple plain text without markup.
The general editorial note is often used to express usage restrictions, as in the following example:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<edNote role="afpnoteRole:client">
EDITORIAL USE ONLY
NO MARKETING NO ADVERTISING CAMPAIGNS
NO ARCHIVE
</edNote>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The following table provides examples of common usage restrictions you might find in pictures documents.
Examples of usage restrictions conveyed by the general editorial note |
|
Phrase inside the general editorial note |
Comment |
RESTRICTED TO EDITORIAL USE |
The picture can be used only by media outlets for news purposes (newspapers, magazines, radios, TVs, news websites and mobile news services…) |
NO MARKETING NO ADVERTISING CAMPAIGNS |
The picture cannot be used for advertising or marketing. |
NO INTERNET |
The picture cannot be published on Internet websites. |
NO MOBILE |
The picture cannot be used by mobile services. |
NO ARCHIVE |
The picture cannot be archived. |
MANDATORY USE WITH AFP STORY |
The handout picture shall be published with the corresponding AFP story only (this mention is only available for handouts). |
TO BE USED WITHIN XX DAYS FROM XX/XX/XXXX |
The picture cannot be used outside of the specified timeframe. |
NO VIDEO EMULATION |
The picture cannot be used in a sequence of pictures to simulate a video. |
5.8. Genres
Text, picture, still graphic and video documents: genres of the document may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A genre represented by a QCode -->
<genre qcode="genre:Interview"/>
<!-- A genre represented by a QCode and a name -->
<genre qcode="afpedtype:VideoWithTitling">
<name>Titling</name>
</genre>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: genres of the document as a whole may be provided in the content metadata section of the main news item. Genres specific to a non-main item may be provided by the content metadata section of this item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- This genre is in the main news item:
it applies to the document as a whole -->
<genre qcode="genre:Interview">
<name>Interview</name>
</genre>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This genre only qualifies this item -->
<genre qcode="genre:Profile">
<name>Profile</name>
</genre>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Genres of a document—and of individual items in the case of multimedia
documents—may be indicated using genre elements. Each genre element
describes the nature or style of the content (for example, an intellectual
or journalistic form). Multiple genre elements may be associated with a single
item, as an item may fall at the intersection of several genres.
In AFP documents, a genre is specified by a QCode, optionally accompanied by a natural‑language name.
Genres are defined using the standard IPTC Genre Newscodes
[IPTCGenres] (scheme alias
genre). In addition, some AFP-specific taxonomies defined under the
schemes http://ref.afp.com/attributes/ (scheme alias afpattribute)
and http://ref.afp.com/editorialtypes/ (scheme alias afpedtype) are
used when no equivalent exists within the IPTC scheme.
When present, the name child element provides a natural‑language name
for the genre.
5.9. Identifier and version number
Text, picture, still graphic, video and multimedia documents: the document identifier is provided in the news item (for multimedia documents: in the main news item). A version number may be present too.
<newsMessage>
<itemSet>
<newsItem guid="http://d.afp.com/MM48X" version="5">
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the document identifier is provided in the package item. A version number may be present too.
<newsMessage>
<itemSet>
<packageItem guid="http://d.afp.com/MM48X" version="5">
</packageItem>
</itemSet>
</newsMessage>
A document is a set of information containing journalistic content and associated metadata. As news stories evolve or corrections are made, new versions of the document are published.
Each NewsML‑G2 document has a globally unique identifier (GUID),
provided by the guid attribute of a newsItem or packageItem
element. A GUID is a character string designed to be globally unique
among all NewsML‑G2 documents—past and future. This GUID allows a document
to be reliably identified as it moves through the news workflow and is
transferred or duplicated across systems. It also forms the basis of the
updating mechanism: an update is performed by sending you a new version
of a document that carries the same GUID as the original.
In AFP’s NewsML-G2 documents, GUIDs can take multiple forms. Examples include URIs in the http scheme, URNs in the namespace "newsml" [RFC3085bis] or AFP UNOs (a format more or less equivalent to IIM UNO).
Note: most AFP GUIDs look like plain URLs, for example:
http://doc.afp.com/11N38S. However, they are non-dereferencable
URIs whose sole purpose is to serve as identifiers.
|
From a technical standpoint, when comparing two representations of journalistic content in NewsML-G2, the GUID determines whether the two representations are those of the same document (in possibly different versions):
-
Same GUIDs means same document
-
Different GUIDs means different documents.
When integrating AFP’s NewsML‑G2 content into your information system,
you will often need to compare GUIDs. For example, when receiving a
document from AFP, you may need to check whether you have already received
a prior version of that document. This can be done by searching for an existing
document in your system with the same GUID.
A version number may be provided using the version attribute, expressed
as an XML Schema positive integer. The a document is received (i.e.,
a document identified by a particular GUID), the document it is not necessarily
in its first version—its version number may already be greater than 1. The version
number is incremented by 1 or more each time the document is updated.
If no version attribute is present, you must assume that the document
version is 1 ( the first version )
|
How a new version of a document should be dealt with.
The answer is given by the NewsML-G2 documentation:
New versions are often issued to enrich earlier ones with additional
information, especially as stories develop in real time. Sometimes,
however, a new version is issued to correct errors in a previous version.
In such cases, additional actions may be required, as erroneous material
may already have been published. Correction‑related updates are explicitly
flagged using a correction |
5.10. Information sources
Text, picture, still graphic and video documents: information sources may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- An information source represented by a name and a role -->
<infoSource role="isrol:origcont">
<name>AP</name>
</infosource>
<!-- An information source represented by a QCode, a name and a role -->
<infoSource qcode="afpsource:2648" role="isrol:origcont">
<name>CHRISTIE'S</name>
</infosource>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: information sources may be provided in the content metadata sections of the main news items. When an information source appears in a news item which is not the main one, it describes an information source for the content of this item. When an information source appears in the main news item, it should be considered as an information source of the "document", with no indication of the specific part of the content it is associated with (if any).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- This information source is in the main news item: it is an information source of the document -->
<infoSource role="isrol:origcont">
<name>AP</name>
</infosource>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This information source is specific to this item -->
<infoSource qcode="afpsource:2648" role="isrol:origcont">
<name>Business Wire</name>
</infoSource>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Information sources of a document, and of individual items in multimedia
documents, may be provided by infoSource elements.
In AFP NewsML-G2 document, an information source is a party (person or
organization) that originated, distributed, aggregated or supplied the
content. For example, in a document created/published by AFP but incorporating
content provided by Business Wire, the source (Business Wire) will appear in
an infoSource element. In AFP documents, an information source is specified
by one of two ways:
-
An URI expressed as a QCode, optionally completed by a natural language name.
-
A natural language name.
The URI space used to specify information source through QCodes is open and may evolve over time.
When present, the name child element provides a natural language name
for the information source.
The role attribute carries a QCode that specifies the role of the
information source. AFP documents use the role "Content originator"
whose Qcode is isrol:origcont and whose concept URI is
http://cv.iptc.org/newscodes/infosourcerole/origcont.
5.11. Keywords
Text, picture, still graphic and video documents: keywords may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<keyword>culture</keyword>
<keyword>arts</keyword>
<keyword>fashion</keyword>
<keyword>auction<keyword>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: keywords applying to the document as a whole may be provided in the content metadata section of the main news item. Keywords specific to an individual item may be provided by the content metadata section of that item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- These keywords are in the main news item:
they are associated with the document as a whole -->
<keyword>culture</keyword>
<keyword>arts</keyword>
<keyword>fashion</keyword>
<keyword>auction<keyword>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- These keywords are specifically associated with this news item -->
<keyword>people</keyword>
<keyword>money</keyword>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: keywords may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<keyword>culture</keyword>
<keyword>arts</keyword>
<keyword>fashion</keyword>
<keyword>auction<keyword>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Keywords are defined in NewsML-G2 as "free-text terms to be used for indexing or finding the content by text-based search engines".
When present, keywords are provided using keyword elements.
Some keyword may have a refined role, expressed by a role attribute.
The value of this attribute is a QCode. Currently AFP may issue the QCode
afpkrole:tagWeb, which resolves to http://cv.afp.com/keywordroles/tagWeb.
For example:
<keyword role="afpkrole:tagWeb">culture</keyword>
Keywords carrying the role http://cv.afp.com/keywordroles/tagWeb
are intended for used in computing tag clouds
[TagClouds].
5.12. Language of the content
Text, picture, still graphic and video documents: the language of the content may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<language tag="en"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: the language of the content may be provided in the content metadata section of each news item.
<newsMessage>
<itemSet>
<!-- An item whose content is in english -->
<newsItem>
<contentMeta>
<language tag="en"/>
</contentMeta>
</newsItem>
<!-- An item whose content is in french -->
<newsItem>
<contentMeta>
<language tag="fr"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
The tag attribute of the language element carries a BCP 47 language
tag [RFC5646] that specifies
the main language of the content. The content refers to what is provided
inline or what is linked to through the content set (i.e., the contentSet
element). For example, in text document this attribute identifies the main
language in which the textual content is written in, whereas in a video document,
it typically indicates the main language used in the soundtrack.
The main languages used by AFP along their BCP 47 tags are shown in the table below.
Main content languages in AFP production |
|
Language |
BCP 47 tag |
Arabic |
ar |
English |
en |
French |
fr |
German |
de |
Portuguese |
pt |
Spanish |
es |
5.13. Language of metadata
Text, picture, still graphic and video documents: the language of metadata is specified by the news item.
<newsMessage>
<itemSet>
<newsItem xml:lang="en">
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: the language of metadata is specified by each news item.
<newsMessage>
<itemSet>
<newsItem xml:lang="en">
</newsItem>
<newsItem xml:lang="en">
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the language of metadata is specified by the package item.
<newsMessage>
<itemSet>
<packageItem xml:lang="en">
</packageItem>
</itemSet>
</newsMessage>
The xml:lang attribute carries a BCP 47 language tag
[RFC5646] that specifies the
main language of the metadata (e.g., titles, subject’s names, caption,
etc.) provided by the item.
In a multimedia document, this attribute has the same value in every new items of the document (i.e., within a given document, all items use the same language for metadata).
| Important design principle: In an AFP NewsML-G2 document, metadata is provided in a single language, with only a few exceptions. When the news content is of global interest, AFP often provide metadata in multiple languages: this is achieved by issuing multiple NewsML-G2 documents (e.g., one with metadata in french, another one with metadata in english, etc.). These are considered separate documents: each one has its own GUID and lifecycle (see section on documents identifiers). |
The main languages used by AFP along their BCP 47 tags are shown in the table below.
Main metadata languages in AFP production |
|
Language |
BCP 47 tag |
Arabic |
ar |
English |
en |
French |
fr |
German |
de |
Portuguese |
pt |
Spanish |
es |
While most metadata in a NewsML-G2 document uses the language specified
by the xml:lang attribute of the item element as shown in the examples
above, there may be exceptions for certain elements. For example, in a
video document the original transcription of some speech is typically
provided in the actual used by the speaker(s), which may differ from
the main language of metadata.
Whenever possible, the language for such metadata is provided by an
xml:lang attribute on the XML element conveying the metadata in
question.
The example below shows a document whose main language of metadata is English but whose "transcription" metadata is provided in French.
<newsMessage>
<itemSet>
<newsItem xml:lang="en">
<partMeta>
<description role="afpdescRole:contentDescription">
Pierre Bergé speaks about the auction.
</description>
<description xml:lang="fr" role="afpdescRole:transcription">
C’est le jour ou le dernier objet sera passé sous le marteau d'un commissaire priseur
que à mon sens – a mon sens - cette collection pourra écrire le mot fin.
</description>
</partMeta>
</newsItem>
</itemSet>
</newsMessage>
5.14. Locations
AFP’s NewsML-G2 documents can convey information about locations. We distinguish between locations from which the content originates (e.g., the place where a news story was written) and locations that constitute the subject matter of the content. These two kinds of locations are conveyed using different mechanisms, described in the following sections.
Locations may also be typed, using a type attribute. The following types
are used in AFP documents:
Types of locations |
|||
Type |
Description |
QCode |
Concept URI |
Geopolitical area |
In AFP documents, this is a generic type that may be used for any kind of location. It simply indicates that the associated element represents a location. |
|
|
Point of interest |
In AFP documents, this type is used for locations that cannot be classified as cities, country areas or countries. For instance the Eiffel Tower, the White House, Sherwood Forest or a specific building would be typed as points of interest. Note that this diverges slightly from NewsML-G2 standard usage, in which forests, ponds, hills, streets or arbitrary places are not usually classified as point of interest. |
|
|
City |
Indicates that the associated element represents a city. |
|
|
Country area |
In AFP documents it is typically used for provinces, states or other areas that may contain multiple cities but that are themselves component of countries. |
|
|
Country |
Indicates that the associated element represents a country. |
|
|
5.14.1. Locations from which the content originates (aka datelines)
Text, picture, still graphic and video documents: the locations from which the content originates are provided in the content metadata section of the news item (in the following example only one location is supplied).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<located qcode="afplocation:281108" type="cpnat:poi">
<name>White House</name>
<related qcode="afplocation:6666" rel="skos:broader" type="loctyp:City">
<name>Washington</name>
<related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea"/>
</related>
<related qcode="afplocation:1149" type="loctyp:CountryArea">
<name>District of Columbia</name>
<related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:206" type="loctyp:Country">
<name>United States</name>
<related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
</related>
<POIDetails>
<position latitude="38.89761" longitude="-77.03637"/>
</POIDetails>
</located>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: for news items within the document, the locations from which the content originates may be provided in the content metadata section of the item (in the following example only one location per item is supplied).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- Location from which the content of the main news item originates -->
<located qcode="afplocation:2500" type="loctyp:City">
<name>Paris</name>
<related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" />
</related>
<geoAreaDetails>
<position latitude="48.85341" longitude="2.34121" />
</geoAreaDetails>
</located>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- Location from which the content of this news item originates -->
<located qcode="afplocation:2613" type="loctyp:City">
<name>Marseille</name>
<related qcode="afplocation:719" rel="skos:broader" type="loctyp:CountryArea">
<name>Bouches-du-Rhône</name>
<related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:67" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="43.29695" longitude="5.38107"/>
</geoAreaDetails>
</located>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the location from which the content originates is provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<located qcode="afplocation:6666" type="loctyp:City">
<name>Washington</name>
<related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
<name>District of Columbia</name>
<related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:206" type="loctyp:Country">
<name>United States</name>
<related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="38.89511" longitude="-77.03637"/>
</geoAreaDetails>
</located>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
In AFP NewsML-G2 documents, located elements specify the geographical
origin of the editorial content conveyed by the <contentSet>
of a news item. Examples of editorial content are : the text of a news
story, the JPEG renditions of a picture document. For live reports,
the located element specifies the geographical origin of the live report.
There is always at least one location provided per item.
Locations from which the content originates are not necessarily the same as the locations the content is about. For example a news story about an event taking place in Paris may be written in London; in such a case, the city of London would be indicated as the location from which the content originates. The locations the content is about are conveyed elsewhere in the document, as described in section "Locations that are subject matter of the document".
There are some subtleties in how "locations from which the content originates" should be interpreted, depending on the nature of the content. These are discussed in the table below. Note that the policy described here is specific to AFP. Other news providers may use other conventions.
Policy used to specify the locations from which the content originates |
|
Nature of content |
Policy |
Text |
A location from which the content originates is usually the
place (e.g., a city) where the text was written or from which it was
dictated. Alternatively it may be the location of the event if an AFP
reporter was present nearby. Several locations ( in the form of |
Picture |
The location from which the content originates is the location of the camera when the picture was taken. This may differ from the location of what the picture shows. Knowing the location of the camera helps determine "how the subject of the picture looks like when viewed from that location". Only one location is provided. |
Video |
The location from which the content originates is the location of the camera when the video was recorded. It may differ from the location of what is shown in the video. Knowing the location of the camera is useful as it lets one know "how the subject of the video looks like when viewed from that location". Only one location is provided. If the video was recorded in multiple places, only one of these - usually the most significant - is provided. |
Still or animated graphic |
When a graphic is produced to accompany or illustrate a separate production (typically of textual nature), the location from which the content originates is the same as that of the associated production. Otherwise, it is the location of the event depicted by the graphic. |
Multimedia |
Each news item in a multimedia document specifies the location(s) from which the content originates. The exact meaning for each news item follows the rules in this table, depending on the nature of its content. |
Live report |
The location from which the content originates is the location of the event being reported. This value may change as the live report progresses. For example, a live report about the Bergé/Saint-Laurent auction may be tagged with the auction venue while reporting on the auction, and later updated to the location of Pierre Bergé press conference while we report on this press conference. |
Locations from which the content originates are provided by
located elements in the content metadata section of news items. A
given located element may convey several type of information about a
location:
-
A QCode identifying the location, via the
qcodeattribute. -
A QCode identifying the location type via the a
typeattribute. AFP documents typically use of the IPTC location types IPTCLocTypes to indicate whether the location is a city, a country area or a country. Locations classified as a "point of interest" use the QCodecpnat:poi(concept URI:http://cv.iptc.org/newscodes/cpnature/poi) from [IPTCCPNatures]. For a description of the different location types see the table in section "Locations". -
The location name, provided by a
nameelement. -
The location geographical coordinates as latitude and longitude in decimal degrees, provided by a
positionelement inside ageoAreaDetails, or in aPOIDetailsif the location is classified as a "point of interest". We use the WGS84 geodesic system. -
Broader geographical entities the location is part along with hierarchical inclusion relationships between them. Each broader entity is provided using a
relatedelement whoserelattribute is the QCodeskos:broader, resolving tohttp://www.w3.org/2004/02/skos/core#broader. Combined with the base location described above, this forms a geographical hierarchy. Typically, three levels are provided in such a hierarchy: a city, a country area and a country; but sometimes we may provide four levels (as in the example above where the location is the White House) or only one or two levels, and we may also provide more in the future. Each of these broader geographical entities may be described with:-
A QCode identifying the broader entity, provided by a
qcodeattribute. -
A QCode identifying the broader entity type, provided by a
typeattribute. As described above for the base location, we use the IPTC location types [IPTCLocTypes] to specify whether it is a city, a country area or a country. -
The broader entity name, provided by a
nameelement. -
The broader entity ISO 3166-1 alpha 3 code [ISO3166], provided by a
relatedelement whoserelattribute, the QCodeskos:exactMatch, resolves tohttp://www.w3.org/2004/02/skos/core#exactMatchand with aqcodeattribute using the scheme aliasiso3166-1a3(scheme URI:http://cvx.iptc.org/iso3166-1a3/). The ISO 3166-1 alpha 3 code is the code part of thisqcodeattribute. -
A reference to a broader geographical entity provided by a
relatedelement whoserelattribute, the QCodeskos:broader, resolves tohttp://www.w3.org/2004/02/skos/core#broader.
-
In text documents or text components of multimedia documents we may provide multiple locations from which the content originates. Current practice limits this to at most two. Below is an example:
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A location from wich the content originates -->
<located qcode="afplocation:2500" type="loctyp:City">
<name>Paris</name>
<related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" />
</related>
<geoAreaDetails>
<position latitude="48.85341" longitude="2.34121" />
</geoAreaDetails>
</located>
<!-- Another location from wich the content originates -->
<located qcode="afplocation:6666" type="loctyp:City">
<name>Washington</name>
<related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
<name>District of Columbia</name>
<related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:206" type="loctyp:Country">
<name>United States</name>
<related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="38.89511" longitude="-77.03637"/>
</geoAreaDetails>
</located>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
|
When are multiple locations provided?
Multiple locations may be provided when the content originates from
more than one place. For example, consider a story about
the Bergé/Saint-Laurent auction. To write this story, AFP may rely on
information supplied by one reporter present at the auction in Paris
and another AFP reporter attending a simultaneous press conference given by
Pierre Bergé in Washington. In this case, both Paris and Washington may be
provided in separated |
5.14.2. Locations that are subject matter of the document
Text, picture, still graphic, video and multimedia documents: locations that are subject matter of the document may be provided in the news item (and, in the case of multimedia documents, in the main news item) within the content metadata section. In text and multimedia documents only, additional information may also be supplied through assertions. Locations that are subject matter of the document are not provided in live report indexes.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- The city of Beijing is a subject of the content -->
<subject qcode="afplocation:2618" type="cpnat:geoArea">
<name>Beijing</name>
</subject>
<!-- The city of Paris is a subject of the content and is a location of the event the content is about -->
<subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
<name>Paris</name>
</subject>
<!-- Some locations are not identified by a qcode attribute but by an uri attribute (typically providing a geo URI [rfc5870])-->
<subject uri="geo:43.82883,5.78688" type="cpnat:geoArea">
<name>Manosque</name>
</subject>
</contentMeta>
<!-- This assertion provides additional information about Beijing -->
<assert qcode="afplocation:2618">
<type qcode="loctyp:City"/>
<geoAreaDetails>
<position latitude="39.9075" longitude="116.39723"/>
</geoAreaDetails>
</assert>
<!-- This assertion provides additional information about Paris -->
<assert qcode="afplocation:2500">
<type qcode="loctyp:City"/>
<broader qcode="afplocation:67" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="48.85341" longitude="2.3488"/>
</geoAreaDetails>
</assert>
<!-- This assertion provides additional information about Manosque -->
<assert uri="geo:43.82883,5.78688">
<type qcode="loctyp:City"/>
<geoAreaDetails>
<position latitude="43.82883" longitude="5.78688"/>
</geoAreaDetails>
</assert>
</newsItem>
</itemSet>
</newsMessage>
Locations that are the subject matter of the document may be provided
using subject elements. Note that other entities such as persons, media
topics, organizations and others may also be conveyed using subject
elements. To distinguish locations from these other entities, a type
attribute is used. Its value is a Qcode, is either cpnat:geoArea (resolving to
http://cv.iptc.org/newscodes/cpnature/geoArea)
or cpnat:poi (resolving to
http://cv.iptc.org/newscodes/cpnature/poi).
All subjects share some common properties, such including optional type and
afp:role attributes that are described in the section on subjects.
Additional information about these locations may be provided using assertions,
represented by an assert elements. Assertions can be associated with specific
locations through their concept URIs: the information provided by an assertion
applies to the location whose concept URI appears in the assertion qcode or
uri attribute.
In the example above, there is a subject element with a QCode that resolves
to http://ref.afp.com/locations/2618 (in AFP documents,
afplocation is a scheme alias for http://ref.afp.com/locations/). We
also have an assert element whose qcode resolves to
http://ref.afp.com/locations/2618. It indicates that both the subject and
the assertion describe the same location.
If you do not perform QCode resolution (cf. section on
controlled vocabularies
and qualified codes), you may correlate QCode-based assertions with
locations by comparing their QCodes directly.
A given assertion may provide several types of information about a location:
-
An identification provided either by a
qcodeattribute or anuriattribute. As discussed above this is used to correlate the assertion with a location. -
A QCode specifying the location type, via the
typeattribute. AFP documents typically use IPTC location types IPTCLocTypes to identify whether the location is a city, a country area or a country. Locations classified as "point of interest" use the QCodecpnat:poi(concept URI:http://cv.iptc.org/newscodes/cpnature/poi) from IPTCCPNatures. -
*Location latitude and longitude in decimal degrees, provided by a
positionelement inside ageoAreaDetails. Unless specified otherwise by agpsdatumattribute, we use the WGS84 geodesic system. -
A broader geographical entitiy the location is part of, provided using a
broaderelement. Typically the broader entity we provide is a country. This broader geographical entity may be described with:-
A QCode identifying it, provided by a
qcodeattribute. -
A QCode identifying its type, provided by a
typeattribute. We make use of the IPTC location types [IPTCLocTypes] to specify whether it is a city, a country area or a country. -
Its name, provided by a
nameelement. -
Its ISO 3166-1 alpha 3 code [ISO3166], provided by a
relatedelement whoserelattribute, the QCodeskos:exactMatch, resolves tohttp://www.w3.org/2004/02/skos/core#exactMatchand with aqcodeattribute using the scheme aliasiso3166-1a3(scheme URI:http://cvx.iptc.org/iso3166-1a3/). The ISO 3166-1 alpha 3 code is the code part of thisqcodeattribute. -
A reference to a broader geographical entity provided by a
relatedelement whoserelattribute, the QCodeskos:broader, resolves tohttp://www.w3.org/2004/02/skos/core#broader.
-
Locations of the event(s)
Some locations that are subject matter of the document may also be
locations of event(s). A location of event is a place where an event
the document is about occurs or is expected to occur. Locations of
event(s) are expressed using subject elements whose role attribute ( in
the namespace http://www.afp.com/format/internal/) is set to
http://cv.afp.com/subjectroles/locationOfEvent.
For example, in a document about the auction of the Pierre Bergé and Yves Saint-Laurent collection, the city of Paris may appear as a subject because the auction takes place in Paris. The city of Beijing may also appear as a subject because the news story mentions China’s claims that some objects in the auction were stolen in Beijing during the opium wars and should therefore be returned. In this case, both cities would appear in dedicated subject elements. Paris may be tagged as a location of event using a role attribute because the auction, which is the event the story is about, occurs in Paris. Beijing, however, would not be tagged as being a location of event because, although it is relevant to the story (it is a subject of the story), it is not a place the event the story is about take place.
There is no default value for the role attribute. if a subject
element conveying a location does not include a role attribute with the
value http://cv.afp.com/subjectroles/locationOfEvent, this does not
indicate that the location isn’t a location of the event. It only means
that the document does not provide that information.
5.15. Products the document belongs to
All documents: products the document belongs to may be provided in the header of the news message.
<newsMessage>
<header>
<afp:headerExtension xmlns:afp="http://www.afp.com/format/internal/">
<!-- The document belongs to this product -->
<afp:product name="EAA" uri="http://products.afp.com/wires/EAA"></afp:product>
<!-- The document also belongs to this other product -->
<afp:product name="MAX" uri="http://products.afp.com/wires/MAX"></afp:product>
</afp:headerExtension>
</header>
</newsMessage>
The commercial relationship between AFP and its clients is often organized around the notion of a product. A product is a subset of AFP’s overall production to which a client may subscribe. Each product is defined by several characteristics such as subject matter, media types, languages, and so on.
If present, product elements are provided within the
headerExtension element inside the header of the newsMessage. The
headerExtension element is an AFP specific extension defined in the
namespace http://www.afp.com/format/internal/.
Each product element identifies a product the document belongs to. This
may include products you personally subscribed to, but not necessarily:
typically, all products the document belongs to are listed irrespective
of your specific subscriptions.
In your information system, product elements may be used to automatically
route documents to specific teams or workflows. For example you might choose
to automatically route documents of the "Economic & Business News" product
directly to your economics specialists.
Each product is uniquely identified by a URI, provided by the uri
attribute. You may request the URIs corresponding to your subscribed
products from your AFP representative.
The name attribute provides a human‑readable name for the product,
intended for display purposes.
The following table provides examples of products.
Examples of products |
||
Name |
Unique identifier |
Description |
EAA |
The World News (EAA) wire offers up-to-the-minute, complete English-language global news, sports and business coverage tailored to the needs of clients in Europe, Africa and the Middle East. EAA also provides in-depth coverage of Europe for European audiences. |
|
MAX |
The world news wire, MAX, carries AFP’s entire English-language news production and is designed for clients requiring fully comprehensive global coverage. |
|
FRS |
The FRS wire is the AFP news feed primarily intended for French customers. This French-language feed offers French and foreign sources of information on varied topics (general news, politics , economy, culture , social, sport and equestrian ), with emphasis on in-depth coverage of France. |
|
DAB |
The DAB wire in French language is designed primarily for African customers. Produced in Paris by a specialized desk, which processes and translates the information gathered by the largest networks of all international agencies active in Africa, it is also powered by the four other regional centers of AFP (Hong Kong, Nicosia, Washington and Montevideo) to provide comprehensive coverage of world news round the clock and seven days a week. |
|
5.16. Provider
Text, picture, still graphic, video and multimedia documents: the provider of the document is given in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<provider qcode="afpprovider:AFP-TV">
<name>AFP-TV</name>
<broader qcode="nprov:AFP"/>
<name>AFP</name>
</broader>
</provider>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the provider of the document is given in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<provider qcode="afpprovider:AFP-TV">
<name>AFP-TV</name>
<broader qcode="nprov:AFP"/>
<name>AFP</name>
</broader>
</provider>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
The provider of a document is the party responsible for managing and
releasing of the document (i.e., the publisher of the document).
It is indicated by the qcode attribute of the provider element. This
element is always present. The QCode is belongs to of one of the following
schemes:
-
The IPTC news provider scheme [IPTCNProviders], whose scheme URI is
http://cv.iptc.org/newscodes/newsprovider/and whose scheme alias isnprov. -
An AFP-defined scheme, whose scheme URI is
http://ref.afp.com/providers/and whose scheme alias isafpprovider.
If present, the name child element, provides a natural language name
for the provider.
If present the broader child element, specifies a larger entity the
provider is part of. This entity is identified by a qcode attribute,
optionally completed by a natural language name in a name element.
In the example above, the document is provided by AFP-TV, a service
within AFP. The fact that this provider is part of AFP is indicated
via the broader element.
5.17. Publishing Status
Text, picture, still graphic, video and multimedia documents: the publishing status is provided by the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ specifying the publishing status"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the publishing status is provided by the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ specifying the publishing status"/>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
A document can be usable, withheld or canceled. The table below describes how this is specified in documents and what it means.
Publishing statuses |
||
Status |
Representation |
Meaning |
Usable |
No |
The document is usable. Note that usable does not necessarily means publishable; for example an embargo may prevent publication of an otherwise usable document. |
Withheld |
A |
The document and all its previous versions must not be used until
further notice (except for a small subset of metadata listed bellow). This
status is typically used when a serious issue with the document is
suspected and is under investigation - for example when important information
in the document is believed to be false. |
Canceled |
A |
The document and all its previous versions must not be used, ever
(except for a few metadata, as described bellow). This status is
typically used when a serious problem with a document is detected (e.g.,
important information in the document has been found to be false) and
the scope of the problem is wide enough to warrant a complete kill of
the document instead of issuing a correction. |
When a document is withheld or canceled, a general editorial note is often included to provide additional information and/or instructions.
The NewsML-G2 specification provides detailed guidance on how this publishing status must be handled when processing documents.
For multimedia documents, the way publishing status is conveyed
differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its
own publishing status, and any G2 item without a pubStatus element is
considered usable. In AFP’s multimedia documents however, only the
pubStatus element of the main item is relevant. The pubStatus elements
of non main items must be ignored. You must therefore process multimedia
documents in a way that applies the publishing status of the main news
item to the entire content of the document (i.e., to all items in the
document).
|
5.18. Subjects
Text, picture, still graphic and video documents: subjects of the document may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A subject represented by a natural language name -->
<subject>
<name>auction</name>
</subject>
<!-- A subject represented by a QCode -->
<subject qcode="medtop:20000273"/>
<!-- A subject represented by a QCode and a natural language name -->
<subject qcode="medtop:01000000">
<name>arts, culture and entertainment</name>
</subject>
<!-- A subject represented by a QCode, a natural language name, a type and a role -->
<subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
<name>Paris</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: subjects of the document as a whole may be provided in the content metadata section of the main news item. Subjects specific to an item may be provided in the content metadata section of that item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- This subject is in the main news item:
it applies to the document as a whole -->
<subject qcode="medtop:20000031" type="cpnat:abstract">
<name>visual art</name>
</subject>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This subject only applies to this news item -->
<subject qcode="medtop:20000011" type="cpnat:abstract">
<name>fashion</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: subjects of the document may be provided in the content metadata section of the package item. In live reports document, subjects expressed using a controlled vocabulary are limited to media topics and event identifiers.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<!-- A subject represented by a natural language name -->
<subject>
<name>auction</name>
</subject>
<!-- A subject represented by a QCode -->
<subject qcode="medtop:20000273"/>
<!-- A subject represented by a QCode and a natural language name -->
<subject qcode="medtop:01000000">
<name>arts, culture and entertainment</name>
</subject>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Subjects represent the key themes or topics of the content ( what the
document is about). Some subjects of a document (and of individual items
in the case of multimedia documents) may be expressed using subject
elements. Each subject element indicates what the document’s content
(or item’s content) is about.
Certain subjects may instead be described using keyword elements
rather than subject elements. However, keywords may also be used for
other purposes: while a keyword can describe a subject of the document,
not all keywords do. For more details, see the Keywords section.
In AFP documents, a subject expressed through a subject element can be
specified in one of the following ways:
-
A natural language name.
-
A URI expressed as a QCode, provided in a
qcodeattribute, and optionally completed by a natural language name. For example, in AFP documentsmedtopis a scheme alias for the schemehttp://cv.iptc.org/newscodes/mediatopic/, therefore the QCodemedtop:20000011resolves to the URIhttp://cv.iptc.org/newscodes/mediatopic/20000011, which identifies the media topic "fashion". -
A URI provided in a
uriattribute, optionally completed by a natural language name. This is used for specifying some locations, using 'geo' URIs as defined in [RFC5870]. Geo URIs identify geographic locations and may include information such as latitude, longitude, and more. For example the URIgeo:13.4125,103.8667identifies the location at latitude 13.4125 and longitude 103.8667 in WGS-84. At the time of writing, AFP documents use simple geo URIs with only latitude and longitude, but additional features (e.g., altitude, uncertainty, etc.) may be used in the future. An example is provided in the section "Locations that are subject matter of the document".
The URI space used to specify subjects through qcode and uri
attributes is open and may evolve over time. AFP documents frequently
use QCodes corresponding to IPTC media topics
[IPTCMediaTopics], a
standard taxonomy for news content classification. QCodes identifying events
are also commonly used to associate a document with the events it covers.
The table below list common schemes used in AFP
documents to identify subjects. Note that this list is not exhaustive.
Common types of subjects used in AFP documents |
|||
Type |
Scheme URI |
Scheme alias |
Comment |
Media topics |
|
Media topics is a standard IPTC taxonomy for
news content classification. For example the concept URI
|
|
Events |
|
An AFP specific scheme for events identification. It is used to relate a document with the event it covers. For more on this topic see the section on event identifiers. |
|
Persons |
|
AFP specific
scheme for persons identification. For example the concept URI
|
|
Organizations |
|
AFP specific scheme for organizations identification. For example the
concept URI |
|
Locations |
|
AFP specific
scheme for locations identification. For example the concept URI
|
|
A subject element may contain a name child element. When present, this
child element provides a natural language name for the subject.
Within a given item, the order of appearance of subject elements offers
a clue about their relative importance (i.e., editorial significance) in
the context of that item: a subject should be considered to have a same
or lower importance than any subject that precedes it. Although AFP documents
do not currently use the rank attribute to explicitly rank subject, this
may change in the future. To ensure forward compatibility, if your NewsML-G2
processor interprets such ranks, the importance indicated by the rank attribute
must take precedence over the importance implied by the order of appearance of
subjects elements. The rank attribute is described in the NewsML-G2
specification.
Optional attributes (these attributes may or may not be
present in a given subject element):
-
type: this attribute carries a QCode that specifies the type of the subject (i.e., person, organization, event, abstract concept, etc.). The value space for this attribute is open, but AFP documents typically use types defined in the standard IPTC "Nature of a concept" controlled vocabulary [IPTCCPNatures].
-
role (in namespace http://www.afp.com/format/internal/): some subjects have a specific role, conveyed through this attribute in the form of a URI. This attribute is not part of the NewsML-G2 standard: it is an AFP specific extension and is therefore defined in a dedicated namespace. Currently, the only permitted value for this attribute - when present - is
http://cv.afp.com/subjectroles/locationOfEvent. When a subject is tagged with this role, then this subject represent a location of the event(s) the editorial content is about. This usage is explained in detail in the section "Locations that are subject matter of the document".
5.19. Titles & subtitles
Documents may contain various types of titles and multiple levels of subtitles.
Note that although NewsML-G2 allows the use of rich text by including markup within the content of titles and subtitles, AFP’s systems only produce plain textual content, without any embedded markup.
5.19.1. Titles
Text, picture, still graphic, video and multimedia documents: titles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- The main title of the document -->
<headline>
YSL-Bergé collection sets new world record at auction
for a private collection
</headline>
<!-- The short title of the document -->
<headline role="afpheadlinerole:shorttitle">
YSL-Bergé collection: a new record at auction
</headline>
<!-- The long title of the document -->
<headline role="afpheadlinerole::longtitle">
Yves Saint Laurent/Pierre Bergé collection sets new world record at
auction for a private collection with more than 206 million euros
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: A title may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<nackageItem>
<contentMeta>
<!-- The title of the live report -->
<headline>YSL-Bergé auction live report</headline>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
All documents may contain a title. In addition, text, picture, still
graphic, animated graphics, video and multimedia documents may include a
short title and/or a long title. When present, these titles are provided
through headline elements located in the content metadata section of the
first item. There can be at most one title, one short title and one long
title.
You can determine the type of any given title by examining the presence
and value of its role attribute, as described in the following table.
Title types |
||
Type |
Function |
Identification |
Title |
The main title of the document: a short summary of the journalistic content. |
No |
Short title |
A shorter version of the title, suitable for display on space constrained surfaces (e.g., mobile handsets). |
A |
Long title |
A longer version of the title. This is a short catch line, useful, for example, for display on a banner. |
A |
5.19.2. Subtitles
Text and multimedia documents: subtitles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item). Subtitles are only provided for text and multimedia documents.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<headline role="afpheadlinerole:subtitle" rank="0">
Auction to continue tuesday and wednesday
</headline>
<headline role="afpheadlinerole:subtitle" rank="1">
Prestigious attendance noted on first day
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
In addition to titles, text and multimedia documents may contain
subtitles. Subtitles complement tittles by providing additional
information about the news content of the document. In current production,
there are at most two subtitles. As with titles, subtitles are provided
through headline elements in the content metadata section of the main
news item. Their subtitle nature is denoted by a role attribute whose
value, the QCode afpheadlinerole:subtitle, resolves to
http://cv.afp.com/headlineroles/subtitle. A rank attribute may also be
present to indicate the relative importance of subtitles. Ranks are
non-negative integers : a subtitle with a lower rank value has higher
importance than a subtitle with a higher rank value. Subtitles without a
rank attribute are considered less important than subtitles that have one.
See the NewsML-G2 specification for additional information about ranks and
their processing model.
5.20. Type of document
An AFP NewsML-G2 document can be of one of the following types:
-
Text
-
Picture
-
Video
-
Still graphic
-
Animated graphic
-
Interactive graphic
-
Multimedia
-
Live report index
The type of a NewsML-G2 document determines key characteristics of the document, including the nature of its content, its XML structure, the metadata it provides and certain aspects of its processing model.
The overview section provides a detailed description each of these types.
To determine the type of a document, you must first identify whether it
is a multimedia or non-multimedia document. A document is considered
multimedia if the item set of the news message contains a news item
whose item metadata section contains a link element with both of the
following :
-
a
relattribute whose value, the QCodecrel:isa, resolves tohttp://cv.iptc.org/newscodes/conceptrelation/isA -
an
hrefattribute whose value is the URIhttp://cv.afp.com/itemnatures/mmdMainComp
In other words, a multimedia document contains the following:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
In a non-multimedia document, the document type corresponds to the item class of the item present in the item set of the news message.
For Text, picture, still graphic, video and multimedia documents the
item class is specified by the value of the qcode attribute of the
itemClass element within the item metadata section of a news item,
as illustrated here:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<itemClass qcode="QCode specifying the type"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
For live reports the item class is specified by the value of the qcode attribute
of the itemClass element in the item metadata section of a package item, as
shown here:
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<itemClass qcode="QCode specifying the type"/>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
The itemClass element is always present. For non-multimedia documents,
its qcode attribute resolves to a concept URI that specifies the
type of the document, as shown in the table below.
Item classes used in AFP document |
||
Type |
QCode |
Concept URI |
Text |
|
|
Picture |
|
|
Video |
|
|
Still graphic |
|
|
Animated graphic |
|
|
Interactive graphic |
|
|
Live report index |
|
|
The NewsML-G2 standard requires the use of one of the
IPTC News Item Nature NewsCodes schemes for item classes. AFP NewsML-G2
departs from this rule by using an AFP-specific scheme (whose URI is
http://cv.afp.com/itemnatures/) in addition to the mandatory IPTC
schemes.
|
5.21. Urgency
Text, picture, still graphic, animated graphic, video and multimedia documents: the urgency of the document may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<urgency>1</urgency>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the urgency of the document may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<urgency>1</urgency>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
A document may include an indication of the editorial urgency of its
content using an urgency element. The value of this element is an
integer from 1 (highest urgency) to 9 (lowest urgency). In practice,
AFP documents are usually assigned urgencies between 1 and 4.
There is often a correlation between this property and the role in workflow of the document. In our documents, flashes are typically issued with the highest urgency (i.e., a value of 1) alerts with an urgency of 2 and urgents with an urgency of 3.
6. Data specific to text and multimedia documents
Some data appear only in text and multimedia documents. This section details these data elements.
6.1. Catchline
Text documents: a catchline may be provided in the content metadata section of the news item, through a headline with role "introduction".
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<headline role="afpheadlinerole:introduction">
The Yves Saint Laurent and Pierre Bergé collection sets new world record at
auction for a private collection on monday, the first day of a three action
days, with more than 206 million euros. Participants describe first day
as "surprising, moving, electric!".
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Text documents: a catchline may be provided in the content metadata section of the news item, through a headline with role "catchline".
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<headline role="afpheadlinerole:catchline">
The Yves Saint Laurent and Pierre Bergé collection sets new world record at
auction for a private collection on monday, the first day of a three action
days, with more than 206 million euros. Participants describe first day
as "surprising, moving, electric!".
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a catchline may be provided in the content metadata section of the main news item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<headline role="afpheadlinerole:catchline">
The Yves Saint Laurent and Pierre Bergé collection sets new world record at
auction for a private collection on monday, the first day of a three action
days, with more than 206 million euros. Participants describe first day
as "surprising, moving, electric!".
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A catch line, when present, provides a clear and concise summary of the story, telling the reader what has happened in straightforward language. It is designed to attract or draw the reader’s attention and gives an overview of the main elements of the news. A catchline may appear at most once per document.
In text documents, the catchline is supplied through a headline element
whose role attribute is the QCode afpheadlinerole:introduction, which
resolves to http://cv.afp.com/headlineroles/introduction. At the time of
writing, a catch line may be provided only in text documents produced by
SID (Sport‑Informations‑Dienst), an AFP subsidiary. If you need to determine
whether the type of text documents you work with may include a catch line,
you are advised to discuss this with your AFP representative.
In multimedia documents the catchline is provided through a headline
element whose role attribute is either afpheadlinerole:catchline
(resolving to http://cv.afp.com/headlineroles/catchline) or
afpheadlinerole:introduction (resolving to http://cv.afp.com/headlineroles/introduction).
While NewsML-G2 allows rich text content with markups within a catch line, AFP systems only output simple plain textual content without embedded markup.
In some documents you may notice that the catchline content is identical to the first paragraph of the document’s main textual. However, this is not always the case : some documents provide an original, distinct catchline.
6.2. Number of hypertext links to external resources in textual or multimedia content
Text documents and multimedia documents: the number of hypertext links to external resources present in textual or multimedia content may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<newsItem>
<itemMeta>
<afp:extension>
<afp:stats>
<afp:totalLinks>
3
</afp:totalLinks>
</afp:stats>
</afp:extension>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The HTML (in XML syntax) rendition of the textual or multimedia content
may include hypertext links to external resources, typically represented
by <a> elements. External resources are resources that are not
intrinsically part of the document; for example, in a multimedia
document a link to one of the document’s own items is not considered an
external resource whereas a link to a Wikipedia page is.
As shown in the example above, the number of such links may be provided as
an integer in a totalLinks element, located within a stats element,
itself contained in an extension element in the item metadata section
of the (main) news item.
Note that the totalLinks, stats and extension elements are not
part of the standard NewsML-G2 vocabulary; they belong to AFP-specific
extension. These elements are defined in the XML namespace :
http://www.afp.com/format/internal/.
6.3. Related production
Text and multimedia documents: mentions of the existence of related production may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<!-- The following signals that AFP is publishing/will publish related photo and video production -->
<signal qcode="afpmedtype:Photo"/>
<signal qcode="afpmedtype:Video"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Text and multimedia documents may include indications of the existence of
related production, that is, additional output covering the same event(s)
as the document itself. For example, if AFP has released or plans to
release photo(s) and video(s) of the Yves Saint Laurent auction , this may
be mentioned in the metadata of a text or multimedia news story
covering this auction, as shown in the example above. To express this, we
use signal elements that indicate which types of related production exist
or are planned, using a controlled vocabulary defined by the scheme
http://ref.afp.com/mediatypes/ (scheme alias: afpmedtype).
We provide only one signal per type of related production. For example,
if several related photos exist, there will be only a single
<signal qcode="afpmedtype:Photo"/> element.
Note that signal elements are also used for other purposes (e.g.,
correction signal). Only signal elements that use
the scheme http://ref.afp.com/mediatypes/ refer to related
production.
The table below lists the QCodes/concepts URIs that are used in these
signal elements. See the overview section for a
descriptions of the various types of news content this table refers to.
Types of related production |
||
Concept URI |
QCode |
Description |
|
|
Related picture(s). For example, a picture of the Yves Saint Laurent auction. |
|
|
Related picture(s) from archive material. It is typically an archive picture of someone or something that plays an important role in the event(s). For example an archive picture of Yves Saint Laurent, or an archive picture of Christie’s salerooms. When this mention is used, the related archive pictures are republished by AFP. |
|
|
Related video(s). For example, a video report about the Yves Saint Laurent auction. |
|
|
Related video(s) providing live coverage. For example a video of the Yves Saint Laurent auction broadcasted live. |
|
|
Related video(s) from archive material. It is typically an archive video
of someone or something that plays an important role in the event(s).
For example an archive video of Yves Saint Laurent, or an archive video
of Christie’s salerooms. When this mention is used, the related archive
videos are republished by AFP. |
|
|
Related courtroom sketch(s). A courtroom sketch is an artistic depiction
of the proceedings in a court of law. In many jurisdictions, cameras are
not allowed in courtrooms in order to prevent distractions and preserve
privacy. Consequently we rely on sketch artists for illustrations of the
proceedings. |
|
|
Related still graphic(s). |
|
|
Related interactive graphic(s). |
|
|
Related videographic(s). |
|
|
Related multimedia document(s). |
|
|
Related live report(s). |
|
|
Related interactive graphic(s). |
The mechanism described in this section is not the only way to handle related production. As explained in the section on event identifiers, we also provide correlation keys that allowing you to identify documents covering the same events.
6.4. Role in workflow
Text and multimedia documents: a role in workflow may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<role qcode="QCode specifying the role in workflow"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Some text and multimedia documents include an indication of their role in
workflow (aka an editorial role). This allows them to be handled in
specific ways. When present, this role is specified by the qcode
attribute of the role element. The possible values are taken from
a controlled vocabulary provided by the IPTC (although we do not use the
entire set). They are described in the table below, where the Concept URI
column gives the URI the QCode resolves to.
Roles in workflow |
|||
Role |
Description |
QCode |
Concept URI |
Flash |
A very short text – typically four or five words – about an event
of exceptional importance. |
|
|
Alert |
A very short, high-priority text. An alert is usually followed within five minutes by an urgent providing more information. It its on a single line. |
|
|
Urgent |
A short text on a major development of a top story. An urgent is typically two paragraphs long, or longer if it provides follow-up to multiple alerts. For a freshly breaking story, an urgent is typically followed within 10 minutes by a 200-250 word lead. |
|
|
Lead |
A sum-up or a complete version of a developing story. |
|
|
When a document is updated, its role in workflow may be updated too. For example, breaking news worthy of immediate may start as alert, then become an urgent, and finally a lead, as it is refreshed or enriched with more content. Each version of the document share the same guid (see the section on identifiers).
Once a document has reached the lead stage, subsequent versions may be described
as "second lead", "third lead" and so on up to a "ninth lead". However,
this qualification is not expressed through the role in workflow property:
all lead versions role property, from the first to the ninth, continue to use the
same concept URI, http://cv.iptc.org/newscodes/edrole/lead
(QCode erol:lead).
To indicate which type of lead the document represents, we use a <genre> element
(see the section on genres). For example, we typically express that a document
is a first lead by assigning it :
-
a workflow role with the ith the concept URI
http://cv.iptc.org/newscodes/edrole/lead -
a genre with the concept URI
http://ref.afp.com/editorialtypes/Lead(QCodeafpedtype:Lead)
as illustrated in the following example:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<role qcode="erol:lead"/>
</itemMeta>
<contentMeta>
<genre qcode="afpedtype:Lead" />
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
For a second lead, the role in workflow remains
http://cv.iptc.org/newscodes/edrole/lead
and a genre with a concept URI http://ref.afp.com/editorialtypes/2ndlead (QCode afpedtype:2ndlead)
is provided:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<role qcode="erol:lead"/>
</itemMeta>
<contentMeta>
<genre qcode="afpedtype:2ndlead" />
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A document with workflow role of lead can also be qualified by the genre general lead, whose meaning is described at the end of the table below. Typically, a general lead has a different guid from the various documents it consolidates. A document cannot simultaneously be a general lead and a first lead or second lead etc.
The following table describes the various genres used to qualify a lead.
Genres used to qualify a lead |
|||
Genre |
Description |
QCode |
Concept URI |
Lead (typically used to mean "first lead") |
A sum-up or a complete version of a developing story |
|
|
Second lead |
A sum-up or a complete version of a developing story. For a given story, a second lead is commonly published only if a lead is already out. The second lead provides a refreshed and/or enriched version of that story. |
|
|
Third lead |
A sum-up or a complete version of a developing story. For a given story, a third lead is commonly published only if a second lead already exists. The third lead provides a refreshed and/or enriched version of that story. |
|
|
Fourth lead |
A sum-up or a complete version of a story. For a given story, a fourth lead is commonly published only if a third lead already exists. The fourth lead provides a refreshed and/or enriched version of that story. |
|
|
Fifth lead |
A sum-up or a complete version of a story. For a given story, a fifth lead is commonly published only after a fourth lead. The fifth lead provides a refreshed and/or enriched version of that story. |
|
|
Sixth lead |
A sum-up or a complete version of a developing story. For a given story, a sixth lead commonly is published only after a fifth lead. It provides a refreshed and/or enriched version of that story. |
|
|
Seventh lead |
A sum-up or a complete version of a developing story. For a given story, a seventh lead is commonly published only if a sixth lead already exists. The seventh lead provides a refreshed and/or enriched version of that story. |
|
|
Eighth lead |
A sum-up or a complete version of a developing story. For a given story, an eighth lead is commonly published only if a seventh lead is already out. The eighth lead provides a refreshed and/or enriched version of that story. |
|
|
Ninth lead |
A sum-up or a complete version of a developing story. For a given story, common usage is that a ninth lead is published only if an eighth already exists. The ninth lead provides a refreshed and/or enriched version of that story. |
|
|
General lead |
A large sum-up or a complete version of a story. A general lead regroups, prioritizes and develops all available elements of a developing story, including elements previously published in multiple documents, each focusing on specific facets of the broader story. |
|
|
6.5. Word count
Text and multimedia documents: the word count is provided in the inline XML rendition of the news item’s content (for multimedia documents, it is provided in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentSet>
<inlineXML wordcount="450"/>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
The word count provides an approximation of the size of the document’s textual content (not including textual content present in metadata). This size is expressed as an approximative count of words: when computed, each individual word may not count as exactly one, since short words contribute less than one and long words contributes more than one.
The word count is supplied by the wordcount attribute of the
inlineXML element of the news item. It is a non-negative integer and
is present in all text and multimedia documents.
7. Data specific to text documents
Some data is specific to text documents. This section describe these data elements in detail.
7.1. Textual content
Text documents: the textual content is provided in the content set of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentSet>
<inlineXML contenttype="application/xhtml+xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
YSL-Bergé collection sets new world record at auction
for a private collection
</title>
</head>
<body>
<p>The Yves Saint Laurent and Pierre Bergé collection sets
new world record at auction for a private collection.
Hundreds of art treasures amassed by late fashion designer
Yves Saint Laurent and his companion Pierre Berge over half
a century are being auctioned.</p>
<p>Bids hit 206 million euros (261 million dollars) on February
23, 2009 making it the biggest private collection ever
auctioned with two days of sales still left to run.</p>
...
...
<!-- An hypertext link -->
The <a class="ignorableTextFalse" href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
wikipedia page about Yves Saint-Laurent</a> claims that ...
...
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
The textual content of the document is its main journalistic text.
The textual content is provided by an inlineXML element. It is
expressed using the XML syntax of HTML. This is explicitly indicated
by a contentType attribute with a value of application/xhtml+xml.
The textual content may also include links to entities that are not logically part of the document, such as other NewsML-G2 documents, Web pages (as shown in the example above), etc. The sections below describe how these link are represented.
Note that text items of multimedia documents may also contain similar data, but with additional information such as links to visual content. This is described in section "Data specific to multimedia documents".
7.1.1. Hypertext links to other resources
The HTML can contain hypertext links to other resources such as Web
pages. These links can be provided using a elements. For example
here is a link to a wikipedia page:
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)" >wikipedia page about Yves Saint-Laurent</a>
The class attribute, if present, may be used to specify either the
class name “ignorableTextFalse” or “ignorableTextTrue”. These class
names are intended to assist consumers who want to remove hypertext links
from the HTML content (a common requirement for some of our clients).
7.1.1.1. ignorableTextFalse
ignorableTextFalse means that when processing the HTML in to remove
links, keeping the text associated with that link will produce a better
result.
For example, suppose that the HTML contains the following fragment
before removing the hypertext links:
Pierre Bergé quoted the
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>
to illustrate...
After removing hypertext links the fragment should be:
Pierre Bergé quoted the wikipedia page about Yves Saint-Laurent to illustrate...
7.1.1.2. ignorableTextTrue
ignorableTextTrue means that when processing the HTML to remove links, also
removing the text associated with that link will produce a better result.
For example, suppose that the HTML contains the following fragment
before removing the hypertext links :
Some text before.
<a class="ignorableTextTrue"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
This Web page provides additional information.
</a>
Some text after.
After removing hypertext links the fragment should be:
Some text before. Some text after.
7.1.2. Links to other NewsML-G2 documents
The HTML may contain links to other NewsML-G2 documents managed by AFP.
Such links are associated with a portion of the textual content. We
represent these links using the g2document microformat. This consists
of a span element whose class attribute include “g2document”.
In addition, we provide another class name indicating the type of the
referenced document: “g2picture”, “g2video”, etc. Finally, we may
include a class name that provides a hint on how a link could be removed
gracefully. For example:
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
some text
</span>
The content of the span element is organized as follows:
-
The first child element is an
atag whosehrefattribute provides the GUID of the NewsML-G2 document. Although it may look like a dereferenceable URI, it is not. This element is marked as non-displayable, as it is not intended to be shown directly. -
After the first child element, another non-displayable
atag may provide a dereferenceable URI for the NewsML-G2 document. This element is typically present when the AFP delivery system determines that it has delivered the corresponding document to you and knows where it is located within your delivery space. -
Finally, the
spancontains the portion of the textual content the other NewsML-G2 document is associated to.
The following table lists the class names used to specify the type of a referenced NewsML-G2 document. See the overview section for a presentation of the various document types.
Types of referenced NewsML-G2 document |
|
Class name |
Type |
g2text |
Text |
g2multimedia |
Multimedia |
g2picture |
Picture |
g2graphic |
Still graphic |
g2animated |
Animated graphic |
g2video |
Video |
g2liveReport |
Live report index |
g2interactive |
Interactive graphic |
The class attribute may also be used to specify “ignorableTextFalse”
or “ignorableTextTrue”. These class names are intended to assist consumers
who need to remove links from the HTML content (a common requirement for some
of our clients).
7.1.2.1. ignorableTextFalse
ignorableTextFalse means that when processing the HTML to remove
links, keeping the text associated with this link will produce a
better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Pierre Bergé quoted
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
a recent AFP news story
</span>
to illustrate...
After removing links the fragment should be:
Pierre Bergé quoted a recent AFP news story to illustrate...
7.1.2.2. ignorableTextTrue
ignorableTextTrue means that when processing the HTML to remove
links, removing the text associated with that link will produce a better
result.
For example, suppose that the HTML contains the following fragment before removing the links :
Some text before.
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
This AFP news story provides additional information.
</span>
Some text after.
After removing links the fragment should be:
Some text before. Some text after.
8. Data specific to visual content
Some data is associated with visual content. It may be present in picture, video, still graphic and animated graphic documents. It may also be present in picture, video, still graphic and animated graphic items of multimedia documents. This section details these data elements.
8.1. Caption
Picture, video, still graphic, animated graphic documents: a caption may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:contentDescription">
French businessman and head of Sidaction organisation Pierre Berge
attends at Marigny theater in Paris.
</description>
<description role="afpdescRole:contextDescription">
This is the first of the four auction days led by Christie's of
Yves Saint-Laurent and Pierre Berge collection, which profit will
fund campaigns against HIV-AIDS.
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a caption may be provided in the content metadata section of each news item conveying picture, video, still-graphic or animated-graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- Caption for the content of this item -->
<description role="afpdescRole:contentDescription">
French businessman and head of Sidaction organisation Pierre Berge
attends at Marigny theater in Paris. This is the first of the four auction days led by Christie's of
Yves Saint-Laurent and Pierre Berge collection, which profit will
fund campaigns against HIV-AIDS.
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- Caption for the content of this other item -->
<description role="afpdescRole:contentDescription">
Christie's auctioneer François de Ricqles proceeds with the auction
of a rabbit head, a Chinese imperial bronze on February 25, 2009
at the Grand Palais in Paris. This object is part of a prized art collection assembled by
Yves Saint Laurent and his partner Pierre Berge over half a
century. One of the world's great private collections, it takes
in masterpieces by Picasso, Mondrian and Matisse, old masters, Art
Deco gems, bronzes, enamels and antiques. Two looted Chinese bronzes
sold for 15.7 million euros (20.3 million dollars) each to anonymous
telephone bidders at the Yves Saint Laurent art sale on Wednesday,
despite protests from Beijing.
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
In picture, video, still graphic or animated graphic documents, the caption, when present, is provided in two parts. The content description is a concise textual description of what is shown in the visual content. The context description provides background information (e.g., context, meaning, etc.) about what is depicted.
The content description may be provided in the associated news item by
adding a description element whose role attribute uses the QCode
afpdescRole:contentDescription, which resolves to
http://cv.afp.com/descriptionRoles/contentDescription.
The context description may be provided by a description element whose
role attribute uses the QCode afpdescRole:contextDescription, which resolves
to http://cv.afp.com/descriptionRoles/contextDescription.
In Multimedia document, the captions of visual components are provided as a single part, as shown in the example above.
There is no caption for text content. In picture, video, still graphic and animated graphic documents, there is a single news item, which is therefore the one that may provide a caption. In multimedia documents, the caption of each picture, video, still graphic and animated graphic may appear in each corresponding news item. There is at most one caption per news item.
Note that although NewsML-G2 allows the use of markup to provide rich text within the content of a caption, AFP’s systems only output simple textual content not interspersed with markup.
|
From time to time the AFP NewsML-G2 format evolves, but you may still want to correctly process older documents that make use of previous versions of the format.
In older documents, captions are represented differently. In some
documents the content description may be provided in the associated
news item by a In even older documents, the content description and context
description may not be provided as separate elements but instead in a
single |
8.2. Copyright Notice
Picture, video, still graphic, animated graphic documents: a copyright notice may be provided in the rights information of the news item.
<newsMessage>
<itemSet>
<newsItem>
<rightsInfo>
<copyrightNotice>Copyright AFP or licensors</copyrightNotice>
</rightsInfo>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a copyright notice may be provided in the rights information of each news item conveying picture, video, still graphic or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<!-- A copyright notice for this item -->
<rightsInfo>
<copyrightNotice>Copyright AFP or licensors</copyrightNotice>
</rightsInfo>
</newsItem>
<newsItem>
<contentMeta/>
<!-- A copyright notice for this item -->
<rightsInfo>
<copyrightNotice>Copyright AFP or licensors</copyrightNotice>
</rightsInfo>
</newsItem>
</itemSet>
</newsMessage>
Note that although NewsML-G2 allows the use of markup to provide rich text within the content of a copyright notice, AFP’s systems only output simple textual content not interspersed with markup.
8.3. Visual content
8.3.1. Basic format
Picture, video, still graphic, animated graphic documents: one or multiple links to visual content may be provided in the content set of the news item.
<newsMessage>
<itemSet>
<!-- A visual item with three different renditions of the same visual content -->
<newsItem>
<contentSet>
<remoteContent href="pictureItem/image1.jpg"/>
<remoteContent href="pictureItem/image2.jpg"/>
<remoteContent href="ftp://example.com/image3.gif"/>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: one or multiple links to visual content may be provided in the content set of each news item conveying picture, video, still graphic or animated graphic content.
<newsMessage>
<itemSet>
<!-- A visual item with three different renditions of the same visual content -->
<newsItem>
<contentSet>
<remoteContent href="pictureItem/image1.jpg"/>
<remoteContent href="pictureItem/image2.jpg"/>
<remoteContent href="ftp://example.com/image3.gif"/>
</contentSet>
</newsItem>
<!-- Another visual item with two rendition of some other visual content -->
<newsItem>
<contentSet>
<remoteContent href="videoItem/video1.mp4"/>
<remoteContent href="http://example.com/video2.mp4"/>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
Links to the actual visual content (e.g., bitmaps, vector graphics,
video frames, etc.) are provided by href attributes of remoteContent
elements. The value of each href attribute is an URI reference (while
NewsML-G2 allows for IRI references, AFP NewsML-G2 documents use only
URI references). See the section
"Accessing visual
content through URI references" for additional directions on how to use
these links.
Each picture, video, still graphic and animated graphic news item
carries information on visual content (i.e., one picture, video or
graphic). However, this content may be available in multiple renditions
(e.g., low resolution, high resolution, JPEG format, TIFF format, etc.).
Each rendition is described by a remoteContent element in the content
set of the item.
| In standard NewsML-G2 "Each rendition [in the content set of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format. [Renditions in the content set of a given news item are] different technical representations of the same logical content". AFP renditions for picture and graphic content do not always conforms to this rule. In addition to providing different technical representations of the same logical content, our renditions may also include crops or other alterations of the content provided by other renditions of the same news item. |
8.3.2. Additional properties of renditions
For each rendition, some information may be provided by attributes on
remoteContent elements. These attributes are described below.
8.3.2.1. Rendition type
To aid in selecting renditions, the type of a rendition may be provided by
a rendition attribute in the remoteContent element that describes the
rendition, as in this example:
<!-- Three description of renditions of different types -->
<remoteContent rendition="rnd:lowRes" href="pictureItem/image1.jpg"/>
<remoteContent rendition="rnd:highRes" href="pictureItem/image2.jpg"/>
<remoteContent rendition="rnd:thumbnail" href="pictureItem/image3.gif"/>
At the time of writing, some remoteContent elements may be delivered
with no rendition attribute. For instance, this is the case for
renditions in postscript or pdf format for still-graphics content. However,
these remoteContent elements will include a contenttype attribute
identifying the format, as detailed in
the section about rendition formats).
The rendition attribute provides a QCode whose possible values are
taken from an IPTC-controlled vocabulary and from AFP-controlled
vocabularies. The following tables provide examples of such values.
Examples of rendition types for picture documents |
||
Concept URI |
QCode |
Description |
|
High resolution image |
|
|
Preview resolution image |
|
|
A very small rendition of an image, providing only a general indication of its content. |
|
Examples of rendition types for still graphic documents |
||
Concept URI |
QCode |
Description |
|
|
Rendition in Adobe Creative Suite 11 format |
|
High resolution image |
|
|
|
A JPEG image in retina resolution. Typically, it contains four times more pixels than the jpeg_standard rendition. |
|
|
A JPEG image in standard resolution |
|
|
A PNG image in retina resolution. Typically, it contains four times more pixels than the png_standard rendition. |
|
|
A PNG image in standard resolution |
|
Preview resolution image |
|
|
A very small rendition of an image, providing only a general indication of its content. |
|
Examples of rendition types for visual components in multimedia documents |
||
Concept URI |
QCode |
Description |
|
Documentation forthcoming |
|
|
|
Rendition of the highest definition of a visual component in a multimedia document |
|
|
Content intended to appear on iPad |
|
Content intended to appear on a mobile or handheld device |
|
|
|
A small squared rendition of an image |
|
A very small rendition of an image, providing only a general indication of its content. |
|
|
Content intended to appear on a web page |
|
Examples of renditions types for interactive documents |
||
Concept URI |
QCode |
Description |
|
|
The interactive rendition |
8.3.2.2. Media type and format
The media type of a rendition may be provided by a contenttype
attribute on the remoteContent element describing the rendition, as in
this example:
<!-- Three description of renditions, each one with a media type -->
<remoteContent contenttype="image/jpeg" href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif" href="pictureItem/image3.gif"/>
The value of the contenttype attribute is a IANA media type name
[MediaTypes].
The contenttype attribute may be complemented by a format attribute
to refine information about the data format of the rendition. For
example:
<!-- Three descriptions of renditions, each one with a media type complemented by a format -->
<remoteContent contenttype="image/jpeg" format="example:JPEG_Baseline"
href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" format="example:JPEG_Progressive"
href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif" format="example:GIF87a"
href="pictureItem/image3.gif"/>
8.3.2.3. Visual dimensions
The width and height of a rendition may be provided by width and
height attributes (whose values are non-negative integers) on the
remoteContent element that describes the rendition. The units in which
these dimensions are expressed may be specified by widthunit and
heightunit attributes. These attributes provide QCodes whose possible
values are take from the IPTC-defined controlled vocabulary for dimension
units (cf. [IPTCDimUnits]).
For example:
<remoteContent width ="640" widthunit ="dimensionunit:pixels"
height="400" heightunit="dimensionunit:pixels" href="pictureItem/image1.jpg"/>
This fragment states that the visual content at pictureItem/image1.jpg is
640 pixels wide and 400 pixels high (in this example, we assume that
dimensionunit is a scheme alias for the IPTC-defined controlled vocabulary
for dimension units).
The possible dimension units constitute a subset of the IPTC dimension units
controlled vocabulary. They are listed in the table below, where the
"Concept URI" column provides the URI to which the heightunit and/or
widthunit attributes resolve.
Dimension units |
||
Unit |
QCode |
Concept URI |
Pixel |
|
|
Typographic Point |
|
|
Millimeter |
|
|
If a width and/or a height attribute is present but the corresponding
dimension unit attribute is missing, then the width and/or height must be
assumed to be expressed in the default unit for that dimension. The default
dimension units, as specified by NewsML-G2, are given in the table below.
Default dimension units |
||
Type of visual content |
Default height unit |
Default width unit |
Picture |
pixels |
pixels |
Graphic (still or animated) |
points |
points |
Digital video |
pixels |
pixels |
8.3.2.4. Size
The size in bytes of a rendition may be provided by a size attribute
on the remoteContent element describing the rendition, as in the
following example:
<remoteContent size="253476" href="pictureItem/image1.jpg"/>
In this example, the size attribute specifies that the representation of
the resource identified by pictureItem/image1.jpg weighs 253476 bytes.
The value of the size attribute is a non-negative integer.
9. Data specific to picture and still graphic content
Some data is present only in picture and still graphic documents, and in picture and still graphic items of multimedia documents. This section describes these data elements.
Note that picture and still graphic documents/items also contain data common to visual content (See the section "Data specific to visual content") and, of course, data common to all kinds of content (See the section "Common data").
9.1. Additional data about visual content
As described in the section "Visual content", a
given visual may have multiple renditions, each one described by a
remoteContent element. This section describes additional data that may
be used to describe a picture or still graphic rendition.
9.1.1. Orientation
The "orientation" of a rendition indicates how the rendition differs in
orientation from the original digital image. It may be provided by an
orientation attribute on the remoteContent element describing the
rendition. The value of this attribute is an integer in the range 1 to 8
(inclusive). For example:
<remoteContent orientation="5" href="pictureItem/image1.jpg"/>
This fragment states that the image at pictureItem/image1.jpg has been
flipped about the vertical axis and rotated 90 degrees counterclockwise
relative to the original image. See the NewsML-G2 specification for a
comprehensive description of the meaning of each value.
If no orientation attribute is present, you should assume a value of
1, which means "upright, no flip, no rotation" (i.e., the visual top of
the original image is at the top, the visual left side of the original
image in on the left, etc.)
9.2. Illustration images (aka previews or thumbnails)
Small illustration images may be provided as part of the content set
through remotecContent elements, just like other renditions. They are
distinguished by the value of their rendition attribute; e.g.,
http://cv.iptc.org/newscodes/rendition/thumbnail,
http://cv.afp.com/renditions/squaredThumbnail. See the
section on visual content for detailed information.
Note that illustration images for video or animated graphics are provided in a different way, as described in the section on icons.
10. Data specific to video and animated graphic content
Some data is only present in video and animated graphic documents, and in video and and animated graphic items of multimedia documents. This section describes these data elements.
Note that video and animated graphic documents/items also contains data common to visual content (see the section "Data specific to visual content") and, of course, data common to all kinds of content (see the section "Common data").
10.1. Additional data about visual content
As described in the section "Visual content", a
given visual may have multiple renditions, each one described by a
remoteContent element. This section describes additional data that may
be used to describe a video and animated graphic rendition.
10.1.1. Duration
The duration of a rendition may be provided by a duration attribute (a
non-negative integer) on the remoteContent element describing the
rendition. The unit in which the duration is expressed may be provided
by a durationunit attribute. This attribute provides a QCode whose
possible values are in a subset of the IPTC-defined controlled vocabulary
for time units
[IPTCTimeUnits]. For
example:
<remoteContent duration="120" durationunit="timeunit:seconds"
href="http://example.com/video2.mp4"/>
This fragment states that the content at http://example.com/video2.mp4
lasts 120 seconds (in this example, we assume that timeunit is a
scheme alias for the IPTC-defined controlled vocabulary for time units).
Possible time units are listed in the table below, where the "Concept
URI" column gives the concept URI to which the QCode provided by
durationunit resolves.
Time units for video or animated graphic duration |
||
Unit |
QCode |
Concept URI |
Edit Unit |
|
|
Second |
|
|
Millisecond |
|
|
If a duration attribute is present without a durationunit attribute,
then you must assume that the duration is expressed in seconds.
10.2. Icon (aka illustration or preview image)
10.2.1. Basic format
Video and animated graphic documents: icon renditions may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<!-- A visual item with two icons -->
<newsItem>
<contentMeta>
<icon href="http://example.com/img1.jpg"/>
<icon href="icons/img2.tiff"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: icon renditions may be provided in the content meta of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<!-- A video or animated graphic item with two icon renditions -->
<newsItem>
<contentMeta>
<icon href="http://example.com/img1.jpg"/>
<icon href="icons/img2.tiff"/>
</contentMeta>
</newsItem>
<!-- A video or animated graphic item with one icon rendition -->
<newsItem>
<contentMeta>
<icon href="ftp://example.com/img3.jpg"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
An icon is an image illustrating a video or an animated graphic (in NewsML-G2, an icon may also be associated with pictures or still graphics, but AFP documents do not use this feature). An icon is typically a keyframe of the visual content, but it can also be a logo or any other illustration.
Each video or animated graphic document, and each video or animated
graphic item of a multimedia document may have at most one logical
visual content as its icon. However, this visual content may be available
in multiple renditions (e.g., low resolution, high resolution, JPEG format,
TIFF format, etc.). Each rendition is described by an icon element in
the contentMeta section the news item.
Links to the actual icon renditions are provided by href attributes of
icon elements. The value of each href attribute is an URI reference
(while NewsML-G2 allows for IRI references, AFP systems only output URI
references). See the section
"Accessing visual
content through URI references" for additional information on these
links usage.
| In standard NewsML-G2 "Each [icon] rendition [in the content metadata section of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format". AFP icon renditions do not always follow this rule: in addition to providing different technical representations of the same visual content, AFP icon renditions may also include crops or other alterations of the content provided by other icon renditions. |
For each icon rendition, additional information may be provided via
attributes on icon elements. These attributes are described below.
10.2.2. Icon rendition type
To aid in selecting icon renditions, the type of a rendition may be
provided by a rendition attribute in the icon element describing the
rendition, as in this example:
<!-- Two icon renditions of different types -->
<icon rendition="rnd:thumbnail" href="icons/img1.jpg"/>
<icon rendition="afprnd:squaredThumbnail" href="icons/img2.tiff"/>
The rendition attribute provides a QCode whose possible values
come from an IPTC-controlled vocabulary and AFP-controlled
vocabularies. Typical values are shown below.
Icon rendition types |
||
QCode |
Concept URI |
Description |
|
A very small rendition of an image, giving only a general idea of its content |
|
|
|
A small squared rendition of an image |
10.2.3. Media type and format
The media type of an icon rendition may be provided by a contenttype
attribute on the icon element describing the rendition, as in this
example:
<!-- Two description of icon renditions of different types -->
<icon contenttype="image/jpeg" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" href="icons/img2.tiff"/>
The value of the contenttype attribute is a IANA media type name
[MediaTypes].
The contenttype attribute may be complemented by a format attribute
to refine information about the data format of the icon rendition. For
example:
<!-- Two descriptionss of icon renditions,
each one with a media type complemented by a format -->
<icon contenttype="image/jpeg" format="example:JPEG_Baseline" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" format="example:NSK-TIFF" href="icons/img2.tiff"/>
10.2.4. Visual dimensions
The width and height of an icon rendition may be provided by width and
height attributes (non-negative integers) on the icon element describing
the rendition. The dimensions units may be provided by widthunit and
heightunit attributes. These attributes provide QCodes whose possible
values are taken from a subset of the IPTC controlled vocabulary for dimension
units [IPTCDimUnits]. For
example:
<icon width ="640" widthunit ="dimensionunit:pixels"
height="400" heightunit="dimensionunit:pixels" href="icons/img1.jpeg"/>
This fragment states that the visual content at icons/img1.jpeg is
640 pixels wide and 400 pixels high (in this example, we assume that
dimensionunit is a scheme alias for the IPTC-defined controlled vocabulary
for dimension units).
The possible dimension units used by AFP are a subset of the IPTC controlled
vocabulary for dimension units. They are listed in the table below, where the
"Concept URI" column gives the URI to which the heightunit and/or
widthunit attributes resolve(s). Currently, AFP always expresses icon
dimensions in pixels.
Dimension units |
||
Unit |
QCode |
Concept URI |
Pixels |
|
|
If a width and/or a height attribute is present but the
corresponding dimension unit attribute is missing, then you may assume
that the width and/or height is expressed in pixels.
10.2.5. Size
The size in bytes of an icon rendition may be provided by a size
attribute on the icon element describing the rendition, as in this
example:
<icon size="253476" href="icons/img1.jpeg"/>
In this example, the size attribute specifies that the representation of
the resource identified by icons/img1.jpeg weighs 253476 bytes.
The value of the size attribute is a non-negative integer.
10.3. Script (aka verbatim or transcript)
Video and animated graphic documents: a script may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:script">
A rare glimpse of the art behind the label.
What Yves Saint Laurent earned in the fashion industry he spent on
masterpieces. At Christie’s auction house in London, a treasure trove of
paintings, sculpture, furniture and jewellery amassed by the fashion
icon and his lover and business partner Pierre Bergé -- over a 50 year
partnership.
SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department,
Christie’s Europe [English, 13 sec]:
"It's unprecedented - I mean we've never sold a collection in recent
memory of that sort of outstanding quality throughout and I think it's
going to be most welcome by collectors who don't have that often a
chance to acquire pieces of such quality"
Following the death of Yves Saint Laurent last year, Bergé chose to sell
the couple’s entire collection, which adorned their apartments in Paris.
For him, the sale is about finding some degree of closure:
SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house
[French, 16 sec]: "C’est le jour ou le dernier objet sera passé sous le
marteau d'un commissaire priseur que à mon sens – a mon sens - cette
collection pourra écrire le mot fin."
"Only on the day that the last piece goes under the hammer of an
auctioneer – in my view – will the last word of this collection be
written"
In spite of the global economic slowdown, Christie’s hopes the
collection will fetch around 400 million dollars when it goes up for
sale in Paris at the end of February.
A cubist-era Picasso – valued at 40 million dollars – and a rare
selection of Mondrians are among the highlights. But for Yves Saint
Laurent and Pierre Bergé, it was not about the price tags – more the
enjoyment of living amongst beautiful art.
SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas
[English, 19 sec]: "There was a great sense of everything being in the
right place - nothing dominating -and no trophies. I think it is a
collection that's formed by two incredibly intelligent people working
completely in concert with eachother - that's very unusual."
But it’s an unusual bond that is soon to be broken up amongst
collectors, dealers and museums – the end of a long reign for
the king of fashion.
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a script may be provided in the content metadata section of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A script for the content of this item -->
<description role="afpdescRole:script">
A rare glimpse of the art behind the label.
What Yves Saint Laurent earned in the fashion industry he spent on
masterpieces.At Christie’s auction house in London, a treasure trove of
paintings, sculpture, furniture and jewellery amassed by the fashion
icon and his lover and business partner Pierre Bergé -- over a 50 year
partnership.
SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department,
Christie’s Europe [English, 13 sec]:
"It's unprecedented - I mean we've never sold a collection in recent
memory of that sort of outstanding quality throughout and I think it's
going to be most welcome by collectors who don't have that often a
chance to acquire pieces of such quality"
...
...
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- A script for the content of this item -->
<description role="afpdescRole:script">
Hundreds of art buyers and lovers from around the world came for the
biggest private collection ever up for auction.
SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
"I arrived two days ago to attend the sale."
SOUNDBITE 2: Vox pop (man) (English, 4 sec)
"I came especially for the exhibition. Going back to New York very
shortly."
...
...
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A script, if present, provides the transcript of voices that can be heard in the video. This may include voices recorded when the video was shot as well as audio commentary written and voiced by a journalist which is added to the images and recounts the events of the story. It may also contain indications of significant sounds (e.g., "the sound of an explosion"). These elements are provided in their order of occurrence in the video or animated graphic.
A script is provided by a description element whose role attribute,
the QCode afpdescRole:script, resolves to
http://cv.afp.com/descriptionRoles/script. It may appear at most once
per item.
Note that in some documents, the content of a description element
whose role attribute resolves to
http://cv.afp.com/descriptionRoles/script isn’t a voice/sound
transcript or isn’t only a voice/sound transcript:
-
It may contain only a "suggested script". One may have to listen to the video to determine whether the text is an actual transcript. Alternatively, this may be signaled in the text by a mention such as "Suggested script:".
-
It may contain a transcript of actual voices/sounds intermingled with "suggested script" elements. One may have to listen to the video to distinguish actual transcript element from suggested element.
-
It may contain a shot list, either stand alone or in addition to elements described above.
Shot lists have their own dedicated slots in this XML format (see the
section "Shot list"), but in some documents they appear
in the slots intended for scripts. For example, here is a description
element that contains both a script and a shot list (we show only partial
content):
<description role="afpdescRole:script">
Script:
Hundreds of art buyers and lovers from around the world came for the biggest
private collection ever up for auction.
SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
"I arrived two days ago to attend the sale."
...
...
Shotlist: (shot Feb 23, 2009)
-wide of auctioneer
-painting on screen
-Berge arriving at auction
-SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
-SOUNDBITE 2: Vox pop (man) (English, 4 sec)
-close up of Matisse
...
...
</description>
Note that while NewsML-G2 allows for rich text by using some markup in the content of a script, AFP’s systems only output simple textual content not interspersed with markup.
10.4. Shot list
Video and animated graphic documents: a shot list may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:shotList">
-Member of Christie's staff walking in front of paintings
-Photographers
-Tilt of YSL poster
-VAR Christie's member of staff with metal art works
-VAR Theodore Gericault painting
-Thomas Seydoux, International Co-Head of Department, Christie’s Europe
-PAN of photo of YSL's flat in Paris
-SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house
-Paintings on wall
-VAR Ferdinand Leger painting
-Picasso painting
-Woman looking at painting
-VAR Frans Hals portrait
-SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas
-People walking through gallery
-Tilt to poster of YSL
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a shot list may be provided in the content metadata section of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A shot list for the content of this item -->
<description role="afpdescRole:shotList">
-Member of Christie's staff walking in front of paintings
-Photographers
-Tilt of YSL poster
...
...
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- A shot list for the content of this item -->
<description role="afpdescRole:shotList">
-wide of auctioneer
-painting on screen
-Berge arriving at auction
-SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
-SOUNDBITE 2: Vox pop (man) (English, 4 sec)
-close up of Matisse
...
...
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A shot list, if present, provides a concise description of each sequence. These elements are provided in their order of occurrence in the video or animated graphic.
A shot list is provided by a description element whose role
attribute, the QCode afpdescRole:shotList, resolves to
http://cv.afp.com/descriptionRoles/shotList. It may appear there at
most once per item.
In some documents, the shot list isn’t provided in this way but appear concatenated to the script (see the section "Script" for an example).
The exact format of a shot list may differ across document types and may also vary according to local journalistic practices.
Note that although NewsML-G2 allows rich text through the use of markup in the content of a shot list, AFP’s systems only output simple textual content without embedded markup.
10.5. Speakers heard during audio or film recording (aka synthe)
Video and animated graphic documents: Speakers heard during audio or film recording may be described in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:synthe">
-Thomas Seydoux (man), International Co-Head of Department,
Christie’s Europe
-Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
-Jonathan Rendell (man), Deputy Chairman, Christie’s Americas
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: Speakers heard during audio or film recording may be described in the content metadata section of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- Speakers heard during recording the content of this item -->
<description role="afpdescRole:synthe">
-Thomas Seydoux (man), International Co-Head of Department,
Christie’s Europe
-Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
-Jonathan Rendell (man), Deputy Chairman, Christie’s Americas
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- Speakers heard during recording the content of this item -->
<description role="afpdescRole:synthe">
-Vox pop woman
-Vox pop man
-Pierre Berge (man), Yves Saint Laurent's partner
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Specific information may be provided about speakers heard during audio or film recording where an important part of the clip value consists of what is said. In most clips these speakers appear in the images, but that may not always be the case.
This information may be provided by a description element whose role
attribute, the QCode afpdescRole:synthe, resolves to
http://cv.afp.com/descriptionRoles/synthe. It may appear at most once
per item. This information is provided in the order of occurrence of
speakers in the video or animated graphic.
This information typically includes the speakers' name and functions. It can be used, for example, to add captions accompanying speakers' appearances in the video.
Note that although NewsML-G2 allows rich text containing some markup in
description elements, AFP’s systems only output simple textual content
without embedded markup.
11. Data specific to multimedia documents
Some data is specific to multimedia documents. This section details these data elements.
11.1. Number of non-main items by nature
Multimedia documents: the number of non-main items, broken down by item natures, may be provided in the item metadata section of the main news item.
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
<afp:extension>
<afp:stats>
<afp:totalComponentsOfType qcode="ninat:graphic" total="1" />
<afp:totalComponentsOfType qcode="ninat:picture" total="3" />
</afp:stats>
</afp:extension>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
As shown above, each totalComponentsOfType element provides the number
of non-main items of a given nature present in the document. The qcode
attribute specifies the nature as described in the following table:
Natures of multimedia non-main items |
||
Type |
QCode |
Concept URI |
Picture |
|
|
Video |
|
|
Still graphic |
|
|
Animated graphic |
|
|
The total attribute provides the number of items of the given nature,
as a strictly positive integer. If the stats element is present, the
absence of a totalComponentsOfType element for a given nature means
that no non-main item of that nature is present in the document.
The totalComponentsOfType elements appears inside a stats element
inside an extension element in the item metadata section of the main
news item. Note that the totalComponentsOfType, stats and
extension elements are not part of the standard NewsML-G2 vocabulary,
but are part of an AFP’s specific extension. They are defined in the XML
namespace http://www.afp.com/format/internal/.
Therefore, the examples given earlier in this section can interpreted as follows:
-
The presence of
<afp:totalComponentsOfType qcode="ninat:graphic" total="1" />means that there is one still graphic item in the document. -
The presence of
<afp:totalComponentsOfType qcode="ninat:picture" total="3" />means that there are three picture items in the document. -
The absence of
totalComponentsOfTypeelement for other item natures means that no animated graphic or video item are present in the document.
The extension and stats elements are optional (i.e., they may or may
not present). When present, they appear at most once per document.
11.2. Multimedia content expressed using the XML syntax of HTML
Multimedia documents: the multimedia content is provided using the XML syntax of HTML in the content set of the main news item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
YSL-Bergé collection sets new world record at auction
for a private collection
</title>
</head>
<body>
<p>
The Yves Saint Laurent and Pierre Bergé collection sets
new world record at auction for a private collection.
Hundreds of art treasures amassed by late fashion designer
Yves Saint Laurent and his companion Pierre Berge over half
a century are being auctioned.
</p>
<p>
<!-- Embedded content from a picture item -->
<span class="g2item g2picture">
<a style="display: none" href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"></a>
<img src="image1.jpeg" style="float: left;"
generator-unable-to-provide-required-alt="" height="163" width="245" />
</span>
</p>
<p>
Bids hit 206 million euros (261 million dollars) on February
23, 2009 making it the biggest private collection ever
auctioned with two days of sales still left to run.
</p>
<p>
<!-- Embedded content from a video item -->
<span class="g2item g2video">
<a style="display: none" href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"></a>
<video style="float: right;" controls="controls" height="138" width="245"
poster="keyframe1.jpeg">
<source src="video1.mp4" type="video/mp4" />
</video>
</span>
</p>
<p>
<!-- An hypertext link to an external resource -->
The <a class="ignorableTextFalse" href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
wikipedia page about Yves Saint-Laurent</a> claims that ...
</p>
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
<newsItem guid="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2">
...
</newsItem>
<newsItem guid="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052">
...
</newsItem>
</itemSet>
</newsMessage>
The multimedia content expressed using the XML syntax of HTML is the
main journalistic content of the document. It is provided by an
inlineXML element. A contentType attribute with a value of
application/xhtml+xml explicitly indicates the usage of the
XML syntax of HTML.
The multimedia content contains the main textual content intermingled with links and audiovisual content. As shown in this figure, some parts of this content (e.g., pictures, videos, etc.) may be described by their own news items. These parts are referred to as "components". The news items describing them are themselves part of the NewsML-G2 document.
The example above uses a microformat
[Microformat] to
denote a component and the reference to the news item that describes it.
This allows providing displayable information (e.g., an img tag)
along with semantic markup (e.g., the reference to the news item)
which can be machine-processed by your system.
This microformat uses a span elements whose class attribute
contains “g2item”. In addition, another class name indicates the
type of the referenced item (e.g., “g2picture”,“g2video”, etc.).
The first child element of such a span is always the reference to the
news item that describes the component. It is represented as an a tag
whose href attribute provides the GUID of the news item. This element
is marked as non displayable and is not meant to be directly
displayed.
After this element, additional HTML markup defines embedded content for
the display of a default rendition of this component. For example, a
document may contain an img element displaying a picture.
This microformat is called the g2item microformat. Another microformat called the g2document microformat is used to represent links to other NewsML-G2 documents. It is described in a dedicated section below.
The following sections detail how various types of components and links are represented.
11.2.1. Picture
The class name “g2item” indicates that we use the g2item microformat: the span represents a component and a reference to the associated news item. The class name “g2picture” indicates that the referenced news item provides picture content. Inside the span,
-
The first element provides the guid of that news item.
-
The second element defines embedded content the display of the picture default rendition of the picture via standard HTML
imgtag. Example:
<span class="g2item g2picture">
<a style="display: none" href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"></a>
<img src="image1.jpeg" style="float: left;"
generator-unable-to-provide-required-alt="" height="163" width="245" />
</span>
11.2.2. Still graphic
Embedded still graphic is defined like embedded picture except that in
the span element we use the class name g2graphic instead of g2picture.
Example:
<span class="g2item g2graphic">
<a style="display: none" href="urn:newsml:afp.com:20100101:7a123456-a542-76fg-ab6a"></a>
<img src="image1.jpeg" style="float: left;"
generator-unable-to-provide-required-alt="" height="163" width="245"/>
</span>
11.2.3. Video
For embedded video we also use the use g2item microformat. The class
name g2video indicates that the referenced news item provides video
content. Inside the span, the first element provides the news item guid.
The embedded video is then defined using a standard HTML
video tag. An illustration image may be provided by poster
attribute; additional attributes such as autoplay, loop, etc. may be
used as well. For example:
<span class="g2item g2video">
<a style="display: none" href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"></a>
<video style="float: right;" controls="controls" height="138" width="245"
poster="keyframe1.jpeg">
<source src="video1.mp4" type="video/mp4" />
</video>
</span>
11.2.4. Hypertext links to other resources
The HTML may contain hypertext links to other resources such as Web
pages. Links may be provided by standard HTML a elements.
Example : a link to a wikipedia page:
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)" >wikipedia page about Yves Saint-Laurent</a>
The class attribute, if present, may be used to specify either the
class name “ignorableTextFalse” or “ignorableTextTrue”. These class
names are meant to assist you if you need to remove hypertext links from
the HTML content (a common need for some of our clients).
11.2.4.1. ignorableTextFalse
ignorableTextFalse means that if you process the HTML in order to
remove links, then, keeping the text associated with this link will
produce a better result.
For example, imagine HTML containing the following fragment before hypertext links removal :
Pierre Bergé quoted the
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>
to illustrate...
After removing hypertext links the fragment should be:
Pierre Bergé quoted the wikipedia page about Yves Saint-Laurent to illustrate...
11.2.4.2. ignorableTextTrue
ignorableTextTrue means that if you process the HTML in order to
remove links then also removing the text associated with this link will
produce a better result.
For example, imagine an HTML contains the following fragment before removing the hypertext links :
Some text before.
<a class="ignorableTextTrue"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
This Web page provides additional information.
</a>
Some text after.
After removing hypertext links the fragment should be:
Some text before. Some text after.
11.2.5. Links to other NewsML-G2 documents
The HTML can contain links to other NewsML-G2 documents managed by AFP.
Such links are associated with a part of the textual content. We
represent these links using the g2document microformat. It consists in
a span element whose class attribute contains “g2document”.
In addition, we provide another class name indicating the type of the
referenced document: “g2picture”, “g2video”, etc. Finally, we may
provide a class name that provides a hint on how a link could be removed
gracefully. For example:
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
some text
</span>
The content of the span element is organized as follows:
-
The first child element of such a span is an
atag whosehrefattribute provides the GUID of the NewsML-G2 document. Note that though it may look like a dereferenceable URI, it actually isn’t. This element is marked as non-displayable as it is not meant to be directly displayed. -
Following this element, another non-displayable
atag may provide the dereferenceable URI reference of the NewsML-G2 document. Typically, this element will be present if the AFP delivery system determines that it has delivered the corresponding document to you and know where to locate it in your delivery space. -
Finally, we provide the part of the textual content the other NewsML-G2 document is associated with.
The following table lists the class names used to specify the type of a referenced NewsML-G2 document. See the overview section for a presentation of the various document types.
Types of referenced NewsML-G2 document |
|
Class name |
Type |
g2text |
Text |
g2multimedia |
Multimedia |
g2picture |
Picture |
g2graphic |
Still graphic |
g2animated |
Animated graphic |
g2video |
Video |
g2liveReport |
Live report index |
g2interactive |
Interactive graphic |
The class attribute may also be used to indicate “ignorableTextFalse”
or “ignorableTextTrue”. These class names helps for HTML content links
removal logic ( link removal is a common need for some of our clients).
11.2.5.1. ignorableTextFalse
ignorableTextFalse means that if you process the HTML in order to
remove links then keeping the text associated with this link will
produce a better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Pierre Bergé quoted
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
a recent AFP news story
</span>
to illustrate...
After removing links the fragment should be:
Pierre Bergé quoted a recent AFP news story to illustrate...
11.2.5.2. ignorableTextTrue
ignorableTextTrue means that if you process the HTML in order to
remove links then also removing the text associated with this link will
produce a better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Some text before.
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
This AFP news story provides additional information.
</span>
Some text after.
After removing links the fragment should be:
Some text before. Some text after.
12. Data specific to Live report posts
Live report posts are represented by multimedia documents. They can contain additional dedicated metadata, as described in this section.
12.1. Live report intertitle
Live report posts: the indication that a post is an intertitle is provided in the item metadata section of the main news item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<!-- This link element tells that this news item is the main item of the multimedia document -->
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
<!-- This link element tells that this multimedia document represents an intertitle in a live report -->
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/liveReportIntertitle"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
While most posts convey a news bit about the ongoing event being
reported, some differ as they represent intertitles. An intertitle
typically provides text describing a phase of the ongoing event, or
a regrouping of a subset of posts.
An intertitle is identified by the presence, in the item metadata section
of its main item, of a link element whose:
-
relattribute convey the concept URIhttp://cv.iptc.org/newscodes/conceptrelation/isA(using the QCodecrel:isA) -
whose
hrefattribute is the URIhttp://cv.afp.com/itemnatures/liveReportIntertitle.
12.2. Timestamp in Live Report
>Live report posts: the timestamp in live report is provided in the item metadata section of the main news item.
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
<afp:extension>
<afp:timestampInLiveReport>
<afp:date>2016-07-09T15:30:33.928Z</afp:date>
<afp:label>15h30</afp:label>
</afp:timestampInLiveReport>
</afp:extension>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The timestamp in live report is provided for multimedia documents that
represent posts in live reports. Each post is associated with a
timestamp. This timestamp is provided by a timestampInLiveReport
element in a extension element inside the item metadata section.
It consists of:
-
a precise date/time, which determines the chronological order of posts in the live report. It is provided as a W3C XML Schema 1.0 date/time by a
dateelement. -
a label, intended to be displayed alongside the content of the post, and tailored to the context. For example, timestamp labels in a live report for a soccer match may be expressed in minutes since the beginning of the match: "Min 45", "Min 46", etc.
These extension, timestampInLiveReport, date and label elements
are all in the XML namespace http://www.afp.com/format/internal/.
13. Data specific to live report indexes
Some data is specific to live report indexes. This section details these data elements.
13.1. Lead
Live report indexes: a lead for the live report may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<description role="afpdescRole:lead">
<html:html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head />
<body>
<p>Live inside Christie's auction of Yves Saint-Laurent/bergé collection.</p>
<p>Auction sparks huge interest. Follow our report and analysis live.</p>
</body>
</html:html>
</description>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
A "lead" for the live report may be provided by a description element
whose a role attribute, the QCode afpdescRole:lead, resolves to
http://cv.afp.com/descriptionRoles/lead. Inside this element the lead
is provided using the XML syntax of HTML in an html element in
namespace http://www.w3.org/1999/xhtml.
When present, the lead contains a short description (typically around one hundred words) of what the live report is about.
13.2. List of posts
Live report indexes: the list of posts of the live report
is provided in the groupSet section of the package item.
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<packageItem>
<groupSet>
<group role="afpgroup:elements">
<!-- An example of a live report index with three posts.
As a story develops, real live reports can include tens or hundred of posts. -->
<itemRef href="d-oc1ku.xml">
<afp:iteminfo>
<headline>Auction opens</headline>
</afp:iteminfo>
</itemRef>
<itemRef href="d-oc02w.xml">
<afp:iteminfo>
<headline>Christie's shows ten most intriguing pieces</headline>
</afp:iteminfo>
</itemRef>
<itemRef href="d-ob2p7.xml">
<afp:iteminfo>
<headline>Press conference scheduled at 7 PM</headline>
</afp:iteminfo>
</itemRef>
</group>
</groupSet>
</packageItem>
</itemSet>
</newsMessage>
The list of posts is provided as a list of links to the NewsML-G2
documents that represent individual posts. These links are provided
inside the groupSet of the package item, in a group element whose
role attribute, the QCode afpgroup:elements, resolves to
http://cv.afp.com/grouproles/elements.
Each link is expressed by an
itemRef element, through an href attribute (see the NewsML-G2
documentation
[G2Doc]
for more information about the itemRef construct).
Inside each itemRef, an itemInfo element in the XML namespace
http://www.afp.com/format/internal/ may provide a title for the post
in an headline element.
The list is chronologically ordered:
-
the first
itemReflinks to the most recent post -
the second
itemReflinks to the second most recent -
and so on
14. Accessing visual content through URI references
In a document, a number of elements provide links to actual visual content in formats such as JPEG, MPEG-4, etc. Some of these elements are defined by NewsML-G2, while others are defined by HTML, since AFP text and multimedia documents can contain HTML (in XML syntax) embedded directly inside NewsML-G2. For example, such links can be provided by:
-
hrefattributes inremoteContentandiconelements. -
srcattributes inimgelements,videoelements, etc. -
posterattributes invideoelements. -
etc.
A link of this type is an URI reference as defined by [RFC3986]. This means it is either an URI or a relative-ref (colloquially referred as "relative URI").
At some point when processing a NewsML-G2 document, you’ll typically want to retrieve the actual visual content, in order to process or display it.
If the link is a (non relative) URI per [RFC3986], you can dereference it directly, using standard software components to retrieve the visual content. Typically, the scheme(s) used for such URI depend(s) on the delivery architecture established between you and AFP. Commonly used schemes include: http, ftp and cid.
If the link is a relative-ref, you must first resolve it to its target URI. You can then dereference the resulting absolute URI to retrieve the visual content.
Note that with most standard libraries that implement URI reference resolution, resolving a (non-relative) URI is the identity operation. As a result, you do not need to determine whether you have been handed an (non-relative) URI or a relative-ref: you can just resolve the URI reference and then dereference it to retrieve the actual visual content.
Section 5 of [RFC3986] defines the process for resolving a URI reference. To perform this process, you need :
-
the URI reference itself (as provided in the document, for example in an
hrefattribute,srcattribute, etc.) -
a base URI. Typically, the base URI is the URI that allows retrieving the NewsML-G2 document.
For example, suppose AFP delivers a package that contains both an AFP NewsML-G2 document and data files for the associated visual content. In this case, the base URI is the URI that locates the NewsML-G2 document after delivery. If AFP delivers content in your file system under the directory "/deliverySpace/internet-journal/topnews/", producing the following file structure :
In this context, the base URI is the URI that allows accessing the
NewsML-G2 document after delivery. If your NewsML-G2 processor accesses
the NewsML-G2 document at
file:///deliverySpace/internet-journal/topnews/doc.afp.com-9719Z-2.xml,
then this is the base URI. The URI references linking to the visual
content can be resolved relatively to this base URI. For example, the
URI reference 5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg
would resolve to
file:///deliverySpace/internet-journal/topnews/5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg,
which can then be dereferenced to access that particular visual content.
Several libraries provide URI reference resolution. For instance, in
Java, one could use the resolve() method of the java.net.URI class.
15. Release Notes
April 2026
The section Genres has been updated to mention the usage of the standard IPTC Genre Newscodes.
New layout and Editorial improvements
August 2021
The section Role in workflow has been enhanced to show that a flash can be followed by an urgent but not by an alert.
The section on caption has been thoroughly rewritten to explain that captions may be provided in two parts, the content description and the context description.
The new concept of renditions dedicated to cancelled documents has been documented in the section on publishing status.
The section on subjects has been completed to explain
that some subjects are identified by an uri attribute.
The section on locations that are subject matter of the document has been completed to show how a location can be specified using a geo URI.
In the section on locations from which the content originates, the entry about graphics has been corrected.
The section on mandatory processing has been enhanced.
The section on catchlines now states that a multimedia
documents may provide a catchline identified by the role
http://cv.afp.com/headlineroles/introduction.
The section on subtitles now states that subtitles are only provided for text and multimedia documents and that usually there is at most two subtitles.
The XML syntax for HTML was formerly referred to as "XHTML". As the latest versions of the HTML living standard no longer use that term, this document no longer use that term either.
This version also includes a number of editorial improvements.
July 2019
A section about mandatory processing has been added.
The sections about visual content rendition types and icon renditions types have been thoroughly updated.
A section about the copyright notice metadata has been added.
The section on content creation date now states that for photo combos, the content creation date we provide is the date of creation of the combo (instead of a shooting date).
A convergence effort between the metadata models of text and multimedia documents is underway in our production system. As a result the Related production and Role in workflow metadata may now be provided on multimedia documents. The documentation has been updated to reflect this change.
The section on publishing status, including information about cancelling documents, has been thoroughly rewritten to provide additional and more precise information.
Update about content warnings: our editorial system now makes use of the newly standardized content warning for "suffering". This documentation has been updated to reflect it.
The section about Visual Dimensions now states that the "millimeters" dimension unit may be used in AFP newsML-G2 documents.
"Related interactive graphic" has been added to the section about related production.
This version also includes a number of editorial improvements.
March 2018
Major update for multimedia documents, including initial documentation of our HTML microformats.
The documentation now states that a location of origin of content can be a "point of interest", in addition to already documented types (city, country area, country). See the section Locations From Which The Content Originates.
The documentation provides a more accurate description of the "synthe" metadata, now stating that it concerns speakers heard during audio or film recording where an important value of the clip consists of what is said. In previous versions it was described as applying only to visible speakers. See the section Speakers heard during audio or film recording (aka synthe).
Tables listing the main languages used in AFP production and their corresponding BCP 47 codes are now provided. See the sections Language of the content and Language of metadata.
Various editorial improvements.
August 2016
The documentation has been updated thoroughly to allow processing AFP NewsML-G2 documents without resolving QCodes.
The documentation now states that along with event identifiers, the names of the events may be provided.
The documentation now states that posts in live report indexes are ordered chronologically (therefore it is no longer your responsibility to sort them).
The description of the "Timestamp in live report" metadata has been
improved to include documentation for the label element.
The documentation of live reports now covers the notion of intertitle.
A number of improvements and clarifications have been made.
July 2016
The documentation for live reports has been added.
This document is now entirely self contained in one file, which makes it easier to distribute and use.
An important correction has been made: in previous versions of this
documentation the concept URI for the "forbyline" role (cf.
section on creators and contributors) was
incorrectly specified as http://cv.afp.com/creatorroles/forbyline .
This has been corrected; the correct concept URI is:
http://cv.afp.com/contributorroles/forbyline.
A section on mentions of related production has been added.
An example has been added to the section on textual content of text document showing that the content can contain hypertext links.
A number of improvements and clarifications have been made.
February 2016
This documentation has been updated thoroughly for text documents.
February 2014
Documentation updated thoroughly in preparation of public delivery of NewsML-G2 documents.
January 2012
Initial version.
16. References
[G2Doc] |
"NewsML-G2 Documentation". IPTC. Available from https://iptc.org/standards/newsml-g2/using-newsml-g2/ |
[MediaTypes] |
Media Types. Available at http://www.iana.org/assignments/media-types/index.html |
[IPTCCPNatures] |
The IPTC controlled vocabulary for basic natures of concepts. Available at http://cv.iptc.org/newscodes/cpnature/ |
[IPTCDimUnits] |
The IPTC controlled vocabulary for dimension units. Available at http://cv.iptc.org/newscodes/dimensionunit/ |
[IPTCGenres] |
The IPTC controlled vocabulary for genres. Available at http://cv.iptc.org/newscodes/genre/ |
[IPTCLocTypes] |
The IPTC controlled vocabulary for location types. Available at http://cv.iptc.org/newscodes/location/ |
[IPTCMediaTopics] |
The IPTC controlled vocabulary for media topics. Available at http://cv.iptc.org/newscodes/mediatopic/ |
[IPTCNProviders] |
The IPTC controlled vocabulary for news providers. Available at http://cv.iptc.org/newscodes/newsprovider/ |
[IPTCTimeUnits] |
The IPTC controlled vocabulary for time units. Available at http://cv.iptc.org/newscodes/timeunit/ |
[IPTCCWarn] |
The IPTC controlled vocabulary for content warnings. Available at http://cv.iptc.org/newscodes/contentwarning/ |
[ISO3166] |
ISO 3166 Maintenance Agency. Available at http://www.iso.org/iso/country_codes.htm |
[HTTPURI] |
"RFC 2616, section 3.2: Uniform Resource Identifiers". R. Fielding & al. June 1999. Available at http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2 |
[RFC3085bis] |
"URN Namespace for news-related resources". M. Steidl and J. Lorenzen. July 2009. Draft available at http://tools.ietf.org/html/draft-steidl-newsml-urn-rfc3085bis-00 |
[RFC3986] |
"Uniform Resource Identifier (URI): Generic Syntax". T. Berners-Lee, R. Fielding and L. Masinter. January 2005. Available at http://tools.ietf.org/html/rfc3986 |
[RFC3987] |
"Internationalized Resource Identifiers (IRIs)". M. Duerst and M. Suignard. January 2005. Available at http://www.ietf.org/rfc/rfc3987 |
[RFC5646] |
"Tags for Identifying Languages". A. Phillips and M. Davis. September 2009. Available at http://tools.ietf.org/html/rfc5646 |
[RFC5870] |
"A Uniform Resource Identifier for Geographic Locations ('geo' URI)". A. Mayrhofer and C. Spanring. June 2010. Available at http://tools.ietf.org/html/rfc5870 |
[TagCloud] |
Wikipedia article on tag cloud. Available at http://en.wikipedia.org/wiki/Tag_Cloud |
[XMLSchemaDataTypes] |
XML Schema Part 2: Datatypes. Available at http://www.w3.org/TR/xmlschema-2/ |
[XMLSpec] |
"Extensible Markup Language (XML) 1.0". Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau. Available at http://www.w3.org/TR/xml/ |
[Microformat] |
Wikipedia article on microformats. Available at http://en.wikipedia.org/wiki/Microformat |
[HTMPSpec] |
HTML Living Standard. Available at https://html.spec.whatwg.org |
Prepared and written by Philippe Mougin
Copyright © 2012-2026 AFP