Updated August 2021
AFP delivers information in a number of ways, tailored to its clients needs. One delivery vector is NewsML-G2, an industry-driven format and processing model allowing rich machine-readable representation of news content.
This document is your technical guide to AFP NewsML-G2 documents. You'll make use of it when implementing systems that receive and process AFP NewsML-G2 documents. It describes how building blocks defined by NewsML-G2 are combined in AFP documents to convey news content and associated metadata (titles, genres, subjects, embargo, etc.). It should be used along the NewsML-G2 documentation provided by IPTC [G2Doc], which it assumes knowledge of.
AFP NewsML-G2 documents build upon the NewsML-G2 format and processing model defined by IPTC (International Press Telecommunications Council) in the context of the NAR (News Architecture). NewsML-G2 is itself an application of XML and makes use of XML Schema. AFP NewsML-G2 documents also make use of the XML syntax of HTML [HTMLSpec] (formerly referred to as "XHTML 5") to represent textual content along with rich structural information as chunks of HTML in XML syntax can be embedded right into NewsML-G2 content. In order to deal with AFP NewsML-G2 documents, you will make use of all these technologies.
Technology stack
Further sections provide an overview of AFP documents structure. They describe the information a document conveys and how to it.
NewsML-G2 documents convey a number of metadata. For the most part you can pick some and ignore others as you see fit. For example, you can make use of IPTC media topics, or opt to not rely on it. Some metadata, however, cannot be ignored and must be processed, such as embargo instructions.
It is possible that, in addition to your NewsML-G2 integration, your workflows process AFP's metadata delivered by other means. For example, for video production we also deliver metadata in the form of a human readable dopesheet sent by email. In the end, these metadata must be correctly processed, be they obtained from NewsML-G2 or from another delivery medium.
Correctly processing the following metadata is mandatory:
Should you encounter questions or difficulties when implementing these mandatory processes, please contact your AFP representative to get assistance.
In actual NewsML-G2 documents delivered by AFP you will find several things neither documented here nor in the NewsML-G2 specification, such as undocumented XML elements and attributes. You must not rely on these undocumented features, unless specifically advised to do so by your AFP representative. These undocumented features are prone to change without notice and contain information that you cannot interpret reliably.
An AFP NewsML-G2 document provides metadata about content published by AFP. For example, when AFP publishes a picture it also publishes an associated NewsML-G2 document that provides metadata about this picture such as a caption, the name of the photographer, the location of the event, etc. Depending on the nature of the content, the NewsML-G2 document can be separate from the main content itself (e.g., a picture as a JPEG file along with a NewsML-G2 document) or can contain the main content (e.g., a textual story embedded inside the NewsML-G2 document).
There are eight main types of AFP news content for which NewsML-G2 can play a role:
The type of a NewsML-G2 document defines important characteristics of the document such as the nature of its content, its XML structure, the metadata it provides as well as some elements of its processing model. As you can see the type of an NewsML-G2 document is named after the type of news content the NewsML-G2 document is associated with.
AFP NewsML-G2 documents of type text, picture, video, still graphic, animated graphic and interactive graphic have the same top-level structure: a NewsML-G2 element called "news message". This news message is an envelope that contains one "news item". This news item represents some news content which can be either a news story in textual form, a photo, a video, a still graphic, an animated graphic or an interactive graphic.
AFP NewsML-G2 documents of type multimedia also have a news message as the top-level structure. This news message is an envelope that contains one or more news item(s): a main item with the multimedia content in the XML syntax of HTML and additional items for photos, videos, etc.
AFP NewsML-G2 documents of type live report index also have a news message as the top-level structure. This news message is an envelope that contains a "package item" providing metadata about the live report as a whole and links to NewsML-G2 documents representing the individual posts of the live report.
Section "Type of document" describes how to determine the type of a document. The following sections provide an overview of the structure of documents.
Text documents have only one news item. This item contains metadata and textual news content. The content is represented by some HTML (in its XML) syntax embedded right into the news item.
Top-level structure of text documents
Picture and still graphic documents have only one news item that conveys only one logical visual content (e.g., one photo). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the picture or still graphic, the news item contains links to the actual visual content (e.g., JPEG resources) for each rendition. The visual content for each rendition isn't provided in the NewsML-G2 document itself, but by external resources (e.g., accompanying files, Web resources, etc.).
Top level structure of picture and still graphic documents (example)
Video and animated graphic documents have only one news item that conveys only one logical visual content (e.g., one video, one animated graphic). However, this content may be available in different renditions (e.g., different formats, resolutions, etc.). In addition to metadata about the video or animated graphic, the news item contains links to the actual visual content (e.g., MPEG resources) for each rendition. The visual content for each rendition isn't provided in the NewsML-G2 document itself, but by external resources (e.g., accompanying files, Web resources, etc.).
The the news item may also contains links to renditions of an icon (aka "illustration" or "preview image"). The renditions of the icon aren't provided in the NewsML-G2 document itself, but in external resources (e.g., accompanying files, Web resources, etc.).
Top-level structure of video and animated graphic documents (example)
Multimedia documents have one or multiples news items. One of these items is the "main news item". It is always present and provides the multimedia content using the XML syntax of HTML. Tt also provides metadata about the document, much like the news item of a text document. It also contains links to other items of the document. These additional items convey information about visual content: pictures, videos or graphics. They are much like the items found in picture, video or graphic documents.
The figure below provides an example of multimedia document with one main item, a picture item and a video item.
Top-level structure of multimedia documents (example)
The main news item is identified by the presence of a specific element in its item metadata section: a link
element whose rel
attribute convey the concept URI http://cv.iptc.org/newscodes/conceptrelation/isA
(using the QCode crel:isA
) and whose href
attribute, an URI, is equal to http://cv.afp.com/itemnatures/mmdMainComp
.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<!-- This link element tells that this news item is the main item of the multimedia document -->
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
</newsItem>
<!-- Additional, non-main items -->
<newsItem></newsItem>
<newsItem></newsItem>
</itemSet>
</newsMessage>
You'll find more information about QCodes in section Controlled vocabularies and qualified codes.
A live report is represented by multiple NewsML-G2 documents :
The figure below shows the top level structure of a live report. You can see the index on the left and the various posts on the right.
Top-level structure of live reports (example)
Below is an example of a simple text document, with just a few metadata and some textual content. Using this example, we will walk through some structural elements that are common to every type of AFP NewsML-G2 documents.
Note that while the XML in this example is formatted to ease reading, actual document you will receive will usually be in a compact form (e.g., all XML on one line).
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<header>
<sent>2009-02-23T20:44:07+02:00</sent>
</header>
<itemSet>
<newsItem standard="NewsML-G2" standardversion="2.28" conformance="power"
guid="http://doc.afp.com/863OC" version="3" xml:lang="en">
<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml"/>
<catalogRef href="http://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2-V2_4.xml"/>
<itemMeta>
<itemClass qcode="ninat:text"/>
<provider qcode="nprov:AFP">
<name>AFP </name>
</provider>
<versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
<pubStatus qcode="stat:usable"/>
</itemMeta>
<contentMeta>
<headline>
YSL-Bergé collection sets new world record at auction
for a private collection
</headline>
<subject qcode="medtop:20000031" type="cpnat:abstract">
<name>visual art</name>
</subject>
<subject qcode="medtop:20000011" type="cpnat:abstract">
<name>fashion</name>
</subject>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+xml" wordcount="70">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
YSL-Bergé collection sets new world record at auction
for a private collection
</title>
</head>
<body>
<p>The Yves Saint Laurent and Pierre Bergé collection sets
new world record at auction for a private collection.
Hundreds of art treasures amassed by late fashion designer
Yves Saint Laurent and his companion Pierre Berge over half
a century are being auctioned.</p>
<p>Bids hit 206 million euros (261 million dollars) on February
23, 2009 making it the biggest private collection ever
auctioned with two days of sales still left to run.</p>
</body>
</html>
</inlineXML>
</contentSet>
<newsItem>
</itemSet>
</newsMessage>
Some notes about this structure:
News message
The newsMessage
element conveys the document. It includes attributes providing namespace declarations and other information such as schema location.
This information is automatically interpreted by the standard XML software components you will likely be using when processing documents (e.g., parsers, validators, etc.).
The newsMessage
element has two children: a header, which provides a transmission date (and possibly some additional information), and an item set which, in this example, contains one news item. In a multimedia document, the item set typically contains multiple items.
News item
The newsItem
element provides the journalistic content along with metadata about this content and other information useful for processing. It has attributes stating the name, version and conformance level of the NAR standard further used in the item. AFP NewsML-G2 documents use NewsML-G2 version 2.10 or higher at the "power" conformance level.
The guid
attribute is a persistent and globally unique identifier for this news item, in the form of an IRI [RFC3987].
The version
attribute, if present, provides the version number of the item. It is incremented (not necessarily by one) when the document is updated.
Catalog information
The news item then carries catalog information using catalogRef
and catalog
elements (only the former is shown in the example above). This information specifies mappings between scheme aliases and scheme URIs. It allows you to resolve qualified codes found, for instance, in qcode
attributes further in the item, to full URIs (i.e., unambiguous identifiers). In the example above, we reference a standard IPTC-provided catalog at http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml
and an AFP specific catalog at http://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2-V2_4.xml
. In actual documents you may find references to other catalogs. The section Controlled vocabularies and qualified codes provides more information about qualified codes resolution.
Item meta data
The itemMeta
element contains information about the news item itself. It always specifies the class of the item (e.g., text, picture, video, etc.), the provider of the item (that will be AFP or a specific AFP service), and the date of creation of this version of the item. It may also contain additional information, such as an embargo directive, a publication status, editorial notes, etc.
Content meta data
The contentMeta
element contains information about the journalistic content of the item (e.g., title, subjects, genres, language, etc.)
Content set
The contentSet
element contains the principal journalistic content of the item.
In text documents this content is provided inline, in an inlineXML
element as shown in this example. See section "Data specific to text documents" for more information.
In picture, video, still graphic and animated graphic documents, this content is provided by reference: the content set contains links to the actual visual content (e.g., links to JPEG files, links to MPEG files, etc.). See section "Visual content" for more information.
In multimedia documents, the content set of the main item contains the multimedia content of the document expressed using the XML syntax of HTML. Inside, the picture, video, still graphic and animated graphic elements are provided by reference through links to other news items. In addition, default renditions are provided through standard HTML elements such as <img>
or <video>
. See section "Data specific to multimedia documents" for more information.
Documents make use of a number of controlled vocabularies (aka taxonomies) to convey information. In this section, we focus on a specific set of controlled vocabularies called "NewsML-G2 schemes".
A NewsML-G2 scheme associates unambiguous identifiers to "concepts". These identifiers take the form of URIs (Uniform Resources Identifiers [RFC3986]).
For example, in NewsML-G2 a document is usable, withheld or canceled; this is known as the "publishing status" :
http://cv.iptc.org/newscodes/pubstatusg2/usable
http://cv.iptc.org/newscodes/pubstatusg2/withheld
http://cv.iptc.org/newscodes/pubstatusg2/canceled
A document can contain a pubStatus
element that conveys the concept URI identifying its publishing status. Therefore, when you receive a document, you can process this concept URI (e.g., compare it to the three possible values given above) to determine what is the publishing status of the document.
In NewsML-G2 documents, some concept URIs are not directly expressed using the URI syntax. Instead, they are conveyed as QCodes (short for "Qualified Codes"). A QCode is made of two parts separated by a colon. The leftmost part (before the leftmost colon) is called the scheme alias. The part on the right of the leftmost colon is called the code.
QCode structure
In some ways, a QCode can be seen as a compressed form of concept URI (actually it is a bit more than that, as it also identifies the controlled vocabulary the concept URI is part of, but this is an advanced topic that we won't develop further in this documentation). Determining the concept URI a QCode stands for is called resolving the QCode. We'll describe how this operation is to be performed at the end of this section.
When processing NewsML-G2 documents it is useful to resolve QCodes to concept URIs and then to work in terms of concept URIs because QCodes are not universally unambiguous identifiers whereas concept URIs are.
For example, in a given document the publishing status "usable" may be expressed by the following QCode: stat:usable
(see it in situ in section Document walk-through). However, in another document the same status might be expressed by the QCode pst:usable
. These two QCodes are different but resolve to the same concept URI: http://cv.iptc.org/newscodes/pubstatusg2/usable
.
Furthermore, while it does not happen within AFP production, if you consider NewsML-G2 documents in general it is even possible for the QCode stat:usable
to express the publishing status "usable" in a given document while expressing something completely different in another document. In that case the resolution process will correctly yield http://cv.iptc.org/newscodes/pubstatusg2/usable
in the context of the first document and a different concept URI in the context of the second document.
Important design principle: QCode resolution shields you from QCode-level variations or accidental homonymies and gives you unambiguous identifiers to work with. |
Depending on your tool chain, QCode resolution might be difficult to implement. For example standards XML tools such as XPath processors can't easily integrate QCode resolution. If you are in such situation you can bypass the QCode resolution step and work directly in terms of QCodes when dealing with AFP's production because we ensures that in our NewsML-G2 documents QCodes are unambiguous (e.g., in all AFP documents the QCode stat:usable
will represent the publishing status "usable").
In this documentation we specify both concept URIs and QCodes wherever needed. Unless specified otherwise, for IPTC standardized NewsML-G2 schemes we use the IPTC recommended QCodes that you can lookup in the corresponding IPTC documentation: for example, if you navigate with your Web browser to the resource identified by the concept URI for the publishing status "usable" (you can do it by clicking on this link: http://cv.iptc.org/newscodes/pubstatusg2/usable
) you'll see that the IPTC recommended QCode for this publishing status is stat:usable
.
When possible, however, it is advised to resolves QCodes. It includes the following benefits:
The resolution process is described precisely in the NewsML-G2 documentation ([G2Doc]). In short, it consists in resolving the scheme alias part of the QCode to a scheme URI using the catalog information provided in the document at the item level, and then to concatenate that scheme URI to the code part of the QCode. In our example, the QCode stat:usable
has a scheme alias stat
and a code usable
. It is resolved to http://cv.iptc.org/newscodes/pubstatusg2/usable
, because the catalog information of the enclosing news item contains the following element :
<scheme alias="stat" uri="http://cv.iptc.org/newscodes/pubstatusg2/"/>
This catalog information can appear inline in the item inside catalog
elements, or in an external resource referenced by the item through a catalogRef
element, as in the following example borrowed from the section Document walk-through:
<catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_26.xml"/>
Resolving a QCode raises a concept URI that unambiguously identifies a given concept on a global scale. In our example, the concept identified by http://cv.iptc.org/newscodes/pubstatusg2/usable
is: the publishing status "usable". In the context of NewsML-G2 schemes, two logically different concepts are never given the same concept URI, even in different systems managed by different organizations.
The following sections of this document are dedicated to answer questions of the form "Where is data X in an AFP NewsML-G2 document (and how can I make use of it)?". For example: "Where is the title of the document?", "Where is the textual content?", "Where is the caption?", "Where is the visual content?" etc.
For each data, XML examples are provided. These examples aren't complete documents, though: they are high-level representations of the format, omitting many aspects and focusing on the data in question.
For instance, here is the example we provide for the "word count" metadata in text documents (the word count gives an estimation of size of the textual content):
<newsMessage>
<itemSet>
<newsItem>
<contentSet>
<inlineXML wordcount="450">
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
As you can see, this example omits many elements: contrast it with the example of a complete document provided in section Document walk-through. What you get from it, however, is a sense of where the word count information can be found and how it looks like.
Some examples contain XML comments. For example:
<!-- A subject represented by a QCode -->
<subject qcode="medtop:20000273"/>
These comments won't appear in real documents, they are annotations specific to this documentation.
Some data may be present in most types of documents. For example, a creation date or content warning can appear in any document (text, picture, still graphic, animated graphic, video, multimedia, live report, ...). This section details these common data elements. Further sections details data associated with specific types of documents.
Text, picture, still graphic and video documents: creators and contributors may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<creator role="afpcrrol:writer afpctrol:forbyline">
<name>
John Doe
</name>
</creator>
<contributor role="afpctrol:editor afpctrol:validator">
<name>
Jeanne Dupont
</name>
</contributor>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: creators and contributors to the multimedia document as a whole may be provided in the content metadata section of the main news item. Creators and contributors specific to an individual item may be provided in the content metadata section of that item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- The creators and contributors to the multimedia document as a whole -->
<creator role="afpcrrol:writer afpctrol:forbyline">
<name>
John Doe
</name>
</creator>
<contributor role="afpctrol:forbyline">
<name>
Jeanne Dupont
</name>
</contributor>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- The creators and contributors specific to this item -->
<creator role="afpcrrol:photographer afpctrol:forbyline">
<name>
Al Dente
</name>
</creator>
<contributor>
<name>
Annie Mall
</name>
</contributor>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: creators may be provided in the content metadata section of the package item. Note that no contributors are provided in live report indexes.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<creator role="afpcrrol:writer afpctrol:forbyline">
<name>
John Doe
</name>
</creator>
<creator role="afpcrrol:writer">
<name>
Walter Melon
</name>
</creator>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Creators and contributors may be provided by creator
and contributor
elements. Creators are persons who created the document or parts of the documents. Contributors are persons who modified or enhanced the document or parts of the documents. There might be any number of creators and contributors per news item.
For each creator and contributor we provide a name in the name
element and optionally a list of roles, in the form of a QCode list, in the role
attribute. The table below presents some roles often used in AFP documents.
Creator and contributor roles | ||
---|---|---|
Role | QCode | Concept URI |
Writer | afpcrrol:writer |
http://cv.afp.com/creatorroles/writer |
Photographer | afpcrrol:photographer |
http://cv.afp.com/creatorroles/photographer |
Graphic designer | afpcrrol:graphicDesigner |
http://cv.afp.com/creatorroles/graphicDesigner |
For byline | afpctrol:forbyline |
http://cv.afp.com/contributorroles/forbyline |
Important: The "for byline" role has a special meaning: the names of creators and contributors without this role must not be published. You may use them for internal purpose such as contacting the journalist for questions, but you must not display them publicly in association with the content of the document.
Text, picture, still graphic, video and multimedia documents: a content warning may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: a content warning may be provided in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
A document may includes a warning about its content when it might be perceived offensive. In such case, you'll typically want to review the content of the document in order to decide how to use it. This warning takes the form of a signal
element with a QCode sig:cwarn
resolving to http://cv.iptc.org/newscodes/signal/cwarn
.
When a content warning is present, we often provide a set of exclAudience
elements that convey the reason(s) for the content warning. For example, in a document whose content contains potentially offensive violence and language:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
<contentMeta>
<exclAudience qcode="cwarn:violence"/>
<exclAudience qcode="cwarn:language"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
In a live report index, the exclAudience
elements are provided in the package item instead of in a news item:
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<signal qcode="sig:cwarn"/>
</itemMeta>
<contentMeta>
<exclAudience qcode="cwarn:violence"/>
<exclAudience qcode="cwarn:language"/>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Used in this way, each exclAudience
element identifies an audience that may be offended or distressed by a given characteristic of the content (e.g. "violence"). In order to specify these, the IPTC's content warnings vocabulary [IPTCCWarn] must be used.
At the time of this writing we make use of the following content warnings, using the standard IPTC scheme: death, language, nudity, sexuality, violence and suffering.
Text, picture, still graphic, video and multimedia documents: a correction signal may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<signal qcode="sig:correction"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
One particular type of update that can occur on a document is a correction. A correction occurs when an error has been found in a document and a corrected version is published. In such case, you receive a new version of the document (i.e., a document with the same guid an a new version number) that contains a correction signal. This signal takes the form of a signal
element with a qcode
attribute sig:correction
resolving to http://cv.iptc.org/newscodes/signal/correction.
Common practice at AFP is to use this mechanism only for corrections of great significance. For example, the correction of a typo that doesn't change the meaning of the news story shall not be marked as a correction but might be issued as a mere update.
When a serious error is found with a key information in a document, which renders it unusable as such, it will usually be canceled instead of corrected. A document is canceled by issuing a version with the "canceled" publishing status, as discussed in section Publishing Status.
The correction signal doesn't provide details about the correction (e.g., what or where was the error, how it has been corrected). Such details will usually be provided in the general editorial note, which is given by an edNote
element with a role
attribute afpnoteRole:client
resolving to http://cv.afp.com/ednoteroles/client
(see the section on the general editorial note). For example:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<edNote role="afpnoteRole:client">
CORRECTS the first sentence of the answer of the auctioneer, which was incorrectly translated.
</edNote>
<signal qcode="sig:correction"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Handling a correction correctly is of paramount importance and can be a complex process (you probably have it in place already). For example, you may want to have someone review the item, along with its previous versions and the editorial note, to understand the error. You may then ensure that this correction is applied to any published material that carries the original error. This may include making sure that recipients of such material are notified and provided with the corrected information.
Two dates formats are used in this specification:
The date has an optional time part: it is optionally possible to omit one to many less significant components, from right to left. “From right to left” means starting from the least significant component (i.e., fraction of a second) and to continue with the full time part, the day part and the month part. The year part MUST NOT be omitted. If the time part is present the time zone SHOULD NOT be omitted.
In addition to the description provided below, you should refer to the NewsML-G2 specification for information on the processing model for these dates.
All documents: the transmission date of the document is provided in the header of the news message.
<newsMessage>
<header>
<sent>2009-02-23T20:44:07+02:00</sent>
</header>
</newsMessage>
The transmission date is provided by the sent
element. It is always present and uses the full date and time format. The transmission date indicates when the document was transmitted from AFP to your system.
Text, picture, still graphic, video and multimedia documents: the creation date of the NewsML-G2 document may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the creation date of the NewsML-G2 document may be provided in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<firstCreated>2009-02-23T18:22:08+02:00</firstCreated>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
If present, the creation date of the NewsML-G2 document is provided by a firstCreated
element in the full date and time format. This creation date specifies when the NewsML-G2 document was created (contrast this with the content creation date, which specifies when some content was created; e.g., when a given photo was shot). When a new version of the document is emitted, the creation date of the document isn't modified, but the version creation date is.
Text, picture, still graphic, video and multimedia documents: the creation date of this version of the NewsML-G2 document is provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the creation date of this version of the NewsML-G2 document is provided in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<versionCreated>2009-02-23T20:43:00+02:00</versionCreated>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
The creation date of this version of the NewsML-G2 document is provided by a versionCreated
element in the full date and time format. This date information is always present in documents.
The content creation date may be provided by a contentCreated
element in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
The creation date of a specific picture, still graphic or video component may be provided in the content metadata section of the corresponding item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- This is the content creation date for this item -->
<contentCreated>2009-02-23T17:31:00+02:00</contentCreated>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This is the content creation date for this other item -->
<contentCreated>2009-02-22</contentCreated>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
While content creation dates may be provided for components, none is provided for the multimedia document itself. The version creation date of the document often provides a good approximation. However this might not be the case for all documents so you should adopt this heuristic approach only if your usage of this date can support a "right most of the time" situation.
As with multimedia documents, no content creation date is provided. The version creation date of the document often provides a good approximation. However this might not be the case for all documents so you should adopt this heuristic approach only if your usage of this date can support a "right most of the time" situation.
No content creation date is provided for live report indexes.
Text, picture, still graphic, video and multimedia documents: embargo information is provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<embargoed/>
<edNote role="afpnoteRole:embargo">
Embargoed until end of first auction day
</edNote>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Embargo information is specified through the embargoed
element, which can be completed by an edNote
element with a role
attribute afpnoteRole:embargo
resolving to http://cv.afp.com/ednoteroles/embargo
.
Embargo-wise, an AFP document can have one of the three statuses described in the table below.
Embargo statuses | ||
---|---|---|
Embargoed | Representation | Example |
No | No embargoed element. |
N/A |
Until given date and time | An embargoed element providing the date and time at which the embargo ends. |
|
Under other provided conditions | An empty embargoed element and an embargo editorial note specifying the embargo conditions. This form is used when the precise date and time at which the embargo expires is not known. Note that if the conditions are made of a date and time and additional conditions, all these conditions are expressed in the editorial note (i.e., the date and time aren't provided inside the embargoed element, but as part of the editorial note). |
|
See the NewsML-G2 specification for more information on the representation and processing model of embargo information.
For multimedia documents, the way embargo information is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its own embargo information, and a G2 item without an embargoed element is defined as not embargoed.
In AFP's multimedia documents the only embargoed element to consider is those of the main item. The embargoed elements of non main items must be ignored. You must process multimedia documents in a way that applies embargo directives provided in the main news item to the entire content of the document (i.e., to all items in the document).
|
Text, picture, still graphic, video and multimedia documents: multiples event identifiers may be provided by subject
elements in the content metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<subject qcode="QCode identifying an event" type="cpnat:event">
<name>
Auction for the Yves Saint Laurent and Pierre Bergé collection
</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: only one event identifier may be provided by a subject
element in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<subject qcode="QCode identifying an event" type="cpnat:event">
<name>
Auction for the Yves Saint Laurent and Pierre Bergé collection
</name>
</subject>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
The news coverage of an event often spans multiple NewsML-G2 documents. For example the auction for the Yves Saint Laurent and Pierre Bergé collection may be covered by two news stories (one announcing the event and one reporting on the event later on), two interview transcripts (one with Pierre Bergé and one with a Christie's representative), a multimedia document, a video report and a number of pictures of the event. It might be interesting for you to know that all these documents are about the same event. For example, it might help your editorial team to access all the documents available about the event. Another example: if you operate a Web site publishing news you could use this knowledge to automatically provide links to related content.
To let you know that multiple NewsML-G2 documents relate to the same event, AFP creates unique event identifiers and insert them into documents. For example, an unique event identifier is assigned to the auction for the Yves Saint Laurent and Pierre Bergé collection, and each related document contains this identifier.
Different NewsML-G2 documents covering the same event
An event identifier is the concept URI of a subject
element whose type
attribute, the QCode cpnat:event
, resolves to http://cv.iptc.org/newscodes/cpnature/event
. It is conveyed by the qcode
attribute.
In addition to event identifiers we provide, whenever possible, the names of the events. An event name provides a short description of the event in natural-language. The name is provided by a name
element inside the subject
element.
See the section on subjects for more information about the subject
element.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<subject qcode="QCode identifying an event" type="cpnat:event">
<name>Name of this event</name>
</subject>
<subject qcode="QCode identifying another event" type="cpnat:event">
<name>Name of this other event</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Why are event identifiers provided using <subject> elements?This is because events covered by a document are also subject matter of the document: things the document is about. Hence it is appropriate to convey their identifiers using the NewsML-G2 <subject> elements, along with other subjects of the documents. This allows them to be generically processed like any other subjects when that make sense, or to be processed specifically as event identifiers when needed, thanks to the type attribute which marks them as such.
|
Text, picture, still graphic, video and multimedia documents: a general editorial note may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<edNote role="afpnoteRole:client">
Original source is unknown and unverified. This photo was posted on twitter.
Following an official ban in San Theodoros on foreign media outlets covering
demonstrations, AFP is using pictures from other sources.
</edNote>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The general editorial note provides some text in natural language addressed to the editorial people in your team receiving and processing the item. It can provide instructions or hints on how to handle the document, information about the nature of a correction (see example in the section on correction signal), excluded audience/usage, additional information about the content, etc. It is not intended for publication.
There is at most one general editorial note in a document. If present, it is provided by an edNote
element whose role
attribute, the QCode afpnoteRole:client
, resolves to http://cv.afp.com/ednoteroles/client
. Note that while NewsML-G2 allows for rich text by using some markup in the content of an editorial note, AFP's systems only output simple textual content not interspersed with markup.
The general editorial note is often used to express usage restrictions, as in the following example:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<edNote role="afpnoteRole:client">
EDITORIAL USE ONLY
NO MARKETING NO ADVERTISING CAMPAIGNS
NO ARCHIVE
</edNote>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The following table provides examples of common usage restrictions you might find in pictures documents.
Examples of usage restrictions conveyed by the general editorial note | |
---|---|
Phrase inside the general editorial note | Comment |
RESTRICTED TO EDITORIAL USE | The picture can be used only by media outlets for news purposes (newspapers, magazines, radios, TVs, news websites and mobile news services...) |
NO MARKETING NO ADVERTISING CAMPAIGNS | The picture cannot be used for advertising or marketing. |
NO INTERNET | The picture cannot be published on Internet websites. |
NO MOBILE | The picture cannot be used by mobile services. |
NO ARCHIVE | The picture cannot be archived. |
MANDATORY USE WITH AFP STORY | The handout picture shall be published with the corresponding AFP story only (this mention is only available for handouts). |
TO BE USED WITHIN XX DAYS FROM XX/XX/XXXX | The picture cannot be used outside of the specified timeframe. |
NO VIDEO EMULATION | The picture cannot be used in a sequence of pictures to simulate a video. |
Text, picture, still graphic and video documents: genres of the document may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A genre represented by a QCode and associated with a rank -->
<genre rank="1" qcode="afpattribute:Interview"/>
<!-- A genre represented by a QCode and a name and associated with a rank -->
<genre rank="2" qcode="afpedtype:VideoWithTitling">
<name>Titling</name>
</genre>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: genres of the document as a whole may be provided in the content metadata section of the main news item. Genres specific to a non-main item may be provided by the content metadata section of this item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- This genre is in the main news item:
it applies to the document as a whole -->
<genre rank="1" qcode="afpattribute:Interview">
<name>Interview</name>
</genre>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This genre only qualifies this item -->
<genre rank="1" qcode="afpattribute:Profile">
<name>Profile</name>
</genre>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Genres of a document, and of individual item in the case of multimedia documents, may be provided by genre
elements. Each genre
element describes a nature or a style of the content (e.g., an intellectual or journalistic form). There may be multiple genre
elements per item, as a given item may be at the intersection of multiple genres.
In AFP documents, a genre is specified by a QCode, optionally completed by a natural language name.
Often used in AFP documents are genre defined in the schemes http://ref.afp.com/attributes/
(scheme alias afpattribute
) and http://ref.afp.com/editorialtypes/
(scheme alias afpedtype
).
The name
child element, if present, provides a natural language name for the genre.
Text, picture, still graphic, video and multimedia documents: the document identifier is provided in the news item (for multimedia documents: in the main news item). A version number may be present too.
<newsMessage>
<itemSet>
<newsItem guid="http://d.afp.com/MM48X" version="5">
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the document identifier is provided in the package item. A version number may be present too.
<newsMessage>
<itemSet>
<packageItem guid="http://d.afp.com/MM48X" version="5">
</packageItem>
</itemSet>
</newsMessage>
A document is a set of information carrying some journalistic content and associated meta data. As news stories develop or corrections are made, new versions of the document are published.
Each NewsML-G2 document has a global unique identifier (guid), which is provided by the guid
attribute of a newsItem
or
In AFP's NewsML-G2 documents, guids can take multiple forms. Examples include URIs in the http scheme, URNs in the namespace "newsml" [RFC3085bis] or AFP UNOs (a format more or less equivalent to IIM UNO).
Note: most AFP GUIDs look like plain URLs, for example: http://doc.afp.com/11N38S . However, they actually are non dereferencable URIs and their purpose is only to serve as identifiers. |
From a technical point of view, given two representations of some journalistic content in NewsML-G2, the guid is what tells whether these two representations are those of the same document (possibly different versions of it): same guids means same document, different guids means different documents.
When integrating AFP's NewsML-G2 production into your information system you'll often need to compare guids. For example, when receiving a document from AFP you'll want to check if you already received some version of this document in the past, an action you'll perform by looking in your system for a document with the same guid.
A version number may be provided by a version
attribute in the form of an XML Schema positive integer. It identifies the version of the document. The first time you receive a given document (i.e., a document identified by a given guid), this document isn't necessarily in its first version. That is, the version number of a document you receive for the first time may be greater than 1. The version number is incremented by 1 or more each time the document is updated. If no version
attribute is present, you must assume that the document is in version 1 (i.e., first version).
How a new version of a document should be dealt with? The answer is given by the NewsML-G2 documentation: In the absence of any specific instructions from the provider, a "usable" item [cf. section on publishing status] should be regarded as replacing any previous version of the item with the same GUID. In practice, a provider is likely to provide some supplementary information in the form of a human-readable
Often, new versions are issued to enrich previous ones with additional information, especially as stories develop in real time. Sometimes, however, a new version is meant to correct some error found in a previous version. In such case you may want to take some additional actions, as it might be the case that erroneous material has been published. Such correction-conveying versions are specifically tagged using a correction <signal> . For more information on this topic see the section on correction signal.
|
Text, picture, still graphic and video documents: information sources may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- An information source represented by a name and a role -->
<infoSource role="isrol:origcont">
<name>AP</name>
</infosource>
<!-- An information source represented by a QCode, a name and a role -->
<infoSource qcode="afpsource:2648" role="isrol:origcont">
<name>CHRISTIE'S</name>
</infosource>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: information sources may be provided in the content metadata sections of the main news items. When an information source appears in a news item which is not the main one, it describes an information source for the content of this item. When an information source appears in the main news item, it should be considered as an information source of the "document", with no indication of the specific part of the content it is associated with (if any).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- This information source is in the main news item: it is an information source of the document -->
<infoSource role="isrol:origcont">
<name>AP</name>
</infosource>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This information source is specific to this item -->
<infoSource qcode="afpsource:2648" role="isrol:origcont">
<name>Business Wire</name>
</infoSource>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Information sources of a document, and of individual items in multimedia documents, may be provided by infoSource
elements.
In AFP NewsML-G2 document, an information source is a party (person or organization) which originated, distributed, aggregated or supplied the content. For example, in a document created/published by AFP but reusing content provided by Business Wire, this source (i.e., Business Wire) will appear in an infoSource
element.
In AFP documents, an information source is specified by either:
The URI space used to specify information source through QCodes is open and can evolve over time.
The name
child element, if present, provides a natural language name for the information source.
The role
attribute carries a QCode that specifies the role of the information source. AFP documents use the role "Content originator" whose Qcode is isrol:origcont
and whose concept URI is http://cv.iptc.org/newscodes/infosourcerole/origcont
.
Text, picture, still graphic and video documents: keywords may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<keyword>culture</keyword>
<keyword>arts</keyword>
<keyword>fashion</keyword>
<keyword>auction<keyword>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: keywords of the document as a whole may be provided in the content metadata section of the main news item. Keywords specific to an individual item may be provided by the content metadata section of that item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- These keywords are in the main news item:
they are associated with the document as a whole -->
<keyword>culture</keyword>
<keyword>arts</keyword>
<keyword>fashion</keyword>
<keyword>auction<keyword>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- These keywords are specifically associated with this news item -->
<keyword>people</keyword>
<keyword>money</keyword>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: keywords may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<keyword>culture</keyword>
<keyword>arts</keyword>
<keyword>fashion</keyword>
<keyword>auction<keyword>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Keywords are defined by NewsML-G2 as "free-text terms to be used for indexing or finding the content by text-based search engines".
If present, keywords are provided by keyword
elements.
Some keyword may have a refined role, expressed by a role
attribute. The value of this attribute is a QCode. Currently we may issue the QCode afpkrole:tagWeb
, which resolves to http://cv.afp.com/keywordroles/tagWeb
. For example:
<keyword role="afpkrole:tagWeb">culture</keyword>
Keywords with a http://cv.afp.com/keywordroles/tagWeb
role are meant to be used to compute tag clouds [TagClouds].
Text, picture, still graphic and video documents: the language of the content may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<language tag="en"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: the language of the content may be provided in the content metadata section of each news item.
<newsMessage>
<itemSet>
<!-- An item whose content is in english -->
<newsItem>
<contentMeta>
<language tag="en"/>
</contentMeta>
</newsItem>
<!-- An item whose content is in french -->
<newsItem>
<contentMeta>
<language tag="fr"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
The tag
attribute of the language
element carries a BCP 47 language tag [RFC5646] that specifies the main language of the content. The content is what is provided inline or linked to by the content set (i.e., the contentSet
element). For example, in text document this attribute specifies the main language the textual content is written in, and in a video document it typically specifies the main language used in the soundtrack.
The main languages used by AFP along their BCP 47 tags are shown in the table below.
Main languages in AFP production | |
---|---|
Language | BCP 47 tag |
Arabic | ar |
English | en |
French | fr |
German | de |
Portuguese | pt |
Spanish | es |
Text, picture, still graphic and video documents: the language of metadata is specified by the news item.
<newsMessage>
<itemSet>
<newsItem xml:lang="en">
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: the language of metadata is specified by each news item.
<newsMessage>
<itemSet>
<newsItem xml:lang="en">
</newsItem>
<newsItem xml:lang="en">
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the language of metadata is specified by the package item.
<newsMessage>
<itemSet>
<packageItem xml:lang="en">
</packageItem>
</itemSet>
</newsMessage>
The xml:lang
attribute carries a BCP 47 language tag [RFC5646] that specifies the main language of the metadata (e.g., titles, subject's names, caption, etc.) provided by the item.
In a multimedia document, this attribute has the same value in every new items of the document (i.e., in a given document, all items make use of the same language for metadata).
Important design principle: In an AFP NewsML-G2 document, metadata is provided in a single language, with exceptions for a few elements. When some news content is of global interest we often provide metadata in multiple languages: in this case we do so by issuing multiple NewsML-G2 documents (e.g., one with metadata in french, another one with metadata in english, etc.). These are different documents: each one has its own GUID and lifecycle (see section on documents identifiers). |
The main languages used by AFP along their BCP 47 tags are shown in the table below.
Main languages in AFP production | |
---|---|
Language | BCP 47 tag |
Arabic | ar |
English | en |
French | fr |
German | de |
Portuguese | pt |
Spanish | es |
While most metadata in a NewsML-G2 document uses the language specified by the xml:lang
attribute of the item element as shown in the examples above, there may be exceptions for a few elements. For example, in a video document the original transcription of some speech is typically provided in the original language that was actually used by the speaker(s), which may differ from the main language of metadata. Whenever possible, the language for such metadata is provided by an xml:lang
attribute on the XML element conveying the metadata in question.
The example below shows a document whose main language of metadata is English but whose "transcription" metadata is in French.
<newsMessage>
<itemSet>
<newsItem xml:lang="en">
<partMeta>
<description role="afpdescRole:contentDescription">
Pierre Bergé speaks about the auction.
</description>
<description xml:lang="fr" role="afpdescRole:transcription">
C’est le jour ou le dernier objet sera passé sous le marteau d'un commissaire priseur
que à mon sens – a mon sens - cette collection pourra écrire le mot fin.
</description>
</partMeta>
</newsItem>
</itemSet>
</newsMessage>
AFP's NewsML-G2 documents can convey information about locations. We establish a distinction between locations from which the content originates (e.g., the place where a news story was written) and locations that are subject matter of the content. These two kind of locations are conveyed using different means, as described in the following sections.
Locations may be typed, using a type
attribute. The following types are used in AFP documents:
Types of locations | |||
---|---|---|---|
Type | Description | QCode | Concept URI |
Geopolitical area | In AFP documents, it is a generic type that may be used for any kind of location. It merely informs that the associated element represents a location. |
cpnat:geoArea |
http://cv.iptc.org/newscodes/cpnature/geoArea |
Point of interest | In AFP documents, this type is used for locations that cannot be classified as cities, country areas or countries. For instance the Eiffel Tower and the White House will be typed as points of interest, as well as the Sherwood forest or a random building. Note that this may diverges a bit from NewsML-G2 standard usage, where areas such as forests, ponds, hills, streets or random places are not usually classified as point of interest. | cpnat:poi |
http://cv.iptc.org/newscodes/cpnature/poi |
City | Informs that the associated element represents a city. | loctyp:City |
http://cv.iptc.org/newscodes/location/City |
Country area | In AFP documents it is typically used for areas such as provinces, states or other areas that may contain multiple cities but which pertain themselves to countries. | loctyp:CountryArea |
http://cv.iptc.org/newscodes/location/CountryArea |
Country | Informs that the associated element represents a country. | loctyp:Country |
http://cv.iptc.org/newscodes/location/Country |
Text, picture, still graphic and video documents: the locations from which the content originates are provided in the content metadata section of the news item (in the following example only one location is provided).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<located qcode="afplocation:281108" type="cpnat:poi">
<name>White House</name>
<related qcode="afplocation:6666" rel="skos:broader" type="loctyp:City">
<name>Washington</name>
<related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea"/>
</related>
<related qcode="afplocation:1149" type="loctyp:CountryArea">
<name>District of Columbia</name>
<related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:206" type="loctyp:Country">
<name>United States</name>
<related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
</related>
<POIDetails>
<position latitude="38.89761" longitude="-77.03637"/>
</POIDetails>
</located>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: for news items in the document, the locations from which the content of the item originates may be provided in the content metadata section of the item (in the following example only one location per item is provided).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- Location from which the content of the main news item originates -->
<located qcode="afplocation:2500" type="loctyp:City">
<name>Paris</name>
<related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" />
</related>
<geoAreaDetails>
<position latitude="48.85341" longitude="2.34121" />
</geoAreaDetails>
</located>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- Location from which the content of this news item originates -->
<located qcode="afplocation:2613" type="loctyp:City">
<name>Marseille</name>
<related qcode="afplocation:719" rel="skos:broader" type="loctyp:CountryArea">
<name>Bouches-du-Rhône</name>
<related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:67" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="43.29695" longitude="5.38107"/>
</geoAreaDetails>
</located>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the location from which the content originates is provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<located qcode="afplocation:6666" type="loctyp:City">
<name>Washington</name>
<related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
<name>District of Columbia</name>
<related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:206" type="loctyp:Country">
<name>United States</name>
<related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="38.89511" longitude="-77.03637"/>
</geoAreaDetails>
</located>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
In AFP NewsML-G2 documents, located
elements specify the geographical origin of the editorial content conveyed by the <contentSet>
of a news item: the text of a news story, the jpeg renditions of a picture document, etc. For live reports, the located
element specify the geographical origin of the live report. There is always at least one location provided per item.
Locations from which the content originates are not necessarily the locations the content is about. For example a news story about an event taking place in Paris may be written in London; in such case the city of London may be specified as the location from which the content originates. The locations the content is about are conveyed in another part of the document, as described in section "Locations that are subject matter of the document".
There are some subtleties about what "locations from which the content originates" means depending on the nature of the content; we discuss them in the table below. Note that the policy described here is specific to AFP. Other conventions might be in place at other news providers.
Policy used to specify the locations from which the content originates | |
---|---|
Nature of content | Policy |
Text | A location from which the content originates is usually a location (e.g., a city) where the text was written or from which it was dictated. Alternatively it might be the location of the event if an AFP reporter is present nearby. Multiple locations may be provided in the form of multiple located elements when the content originates (as defined here) from multiples locations; in this case the usual practice is to provide no more than two locations.
|
Picture | The location from which the content originates is the location of the camera when the picture was shot. Therefore it may differ from the location of what is shown in the picture. Knowing the location of the camera is useful as it lets one know "how the subject of the picture looks like when viewed from that location". Only one location is provided. |
Video | The location from which the content originates is the location of the camera when the video was recorded. Therefore it may differ from the location of what is shown in the video. Knowing the location of the camera is useful as it lets one know "how the subject of the video looks like when viewed from that location". Only one location is provided. If the video is shot in different places, only one of these places is provided, usually the most significant. |
Still or animated graphic | When a graphic is produced, it is often accompanying or illustrating a separate production (typically of textual nature). In such case the location from which the content originates is the same as this production. Else, it is the location of the event the graphic is about. |
Multimedia | Each news item in a multimedia document specifies the location(s) from which the content originates. The exact meaning for each news item is determined by the nature of its content as described in this table. |
Live report | The location from which the content originates is the location of the event the live report is about. The value of this metadata can change as the live report develops. For example, the live report about the Bergé/Saint-Laurent auction may be tagged with the location where the auction takes place while we report on the auction, and later be tagged with the location where the Pierre Bergé press conference takes place while we report on this press conference. |
The locations from which the content originates are provided by located
elements in the content metadata section of news items. A given located
element may convey several informations about a location:
qcode
attribute.type
attribute. In AFP documents we typically make use of the IPTC location types [IPTCLocTypes] to specify whether the location a city, a country area or a country. For locations that are classified as a "point of interest", we use the QCode cpnat:poi
(concept URI: http://cv.iptc.org/newscodes/cpnature/poi
) from [IPTCCPNatures]. For a description of the different location types see the table in section "Locations".name
element.position
element inside a geoAreaDetails
, or in a POIDetails
if the location is classified as a "point of interest". We use the WGS84 geodesic system.related
element whose rel
attribute, the QCode skos:broader
, resolves to http://www.w3.org/2004/02/skos/core#broader
. Combined with the base location described above this forms a geographical hierarchy. Typically we provide three levels in this hierarchy: a city, a country area and a country; but sometimes we may provide four levels (as in the example above where the location is the White House) or only one or two levels, and we may also provide more in the future. Each of these broader geographical entities may be described with: qcode
attribute.type
attribute. As described above for the base location, we make use of the IPTC location types [IPTCLocTypes] to specify whether it is a city, a country area or a country.name
element.related
element whose rel
attribute, the QCode skos:exactMatch
, resolves to http://www.w3.org/2004/02/skos/core#exactMatch
and with a qcode
attribute using the scheme alias iso3166-1a3
(scheme URI: http://cvx.iptc.org/iso3166-1a3/
). The ISO 3166-1 alpha 3 code is the code part of this qcode
attribute.related
element whose rel
attribute, the QCode skos:broader
, resolves to http://www.w3.org/2004/02/skos/core#broader
.In text documents or text components of multimedia documents we may provide multiple locations from which the content originates. In this case the current practice being to provide at most two. Below is an example:
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A location from wich the content originates -->
<located qcode="afplocation:2500" type="loctyp:City">
<name>Paris</name>
<related qcode="afplocation:67" rel="skos:broader" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch" />
</related>
<geoAreaDetails>
<position latitude="48.85341" longitude="2.34121" />
</geoAreaDetails>
</located>
<!-- Another location from wich the content originates -->
<located qcode="afplocation:6666" type="loctyp:City">
<name>Washington</name>
<related qcode="afplocation:1149" rel="skos:broader" type="loctyp:CountryArea">
<name>District of Columbia</name>
<related qcode="afplocation:206" rel="skos:broader" type="loctyp:Country"/>
</related>
<related qcode="afplocation:206" type="loctyp:Country">
<name>United States</name>
<related qcode="iso3166-1a3:USA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="38.89511" longitude="-77.03637"/>
</geoAreaDetails>
</located>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
When are multiple locations provided? Multiple locations may be provided when the content originates from multiple locations. For example, suppose that we publish a story about the Bergé/Saint-Laurent auction. To write this story we might use informations provided by an AFP reporter present at the auction in Paris and by another AFP reporter present at a press conference given by Pierre Bergé at the same time in Washington. In this case we might provide Paris and Washington in located elements. Alternatively we might choose to provide the location where the story is actually written (say, e.g. London) instead of Paris and Washington.
|
Text, picture, still graphic, video and multimedia documents: locations that are subject matter of the document may be provided in the news item (for multimedia documents: in the main news item) in the content metadata section. In text and multimedia documents only, additional information may be provided in assertions. Locations that are subject matter of the document are not provided in live report indexes.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- The city of Beijing is a subject of the content -->
<subject qcode="afplocation:2618" type="cpnat:geoArea">
<name>Beijing</name>
</subject>
<!-- The city of Paris is a subject of the content and is a location of the event the content is about -->
<subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
<name>Paris</name>
</subject>
<!-- Some locations are not identified by a qcode attribute but by an uri attribute (typically providing a geo URI [rfc5870])-->
<subject uri="geo:43.82883,5.78688" type="cpnat:geoArea">
<name>Manosque</name>
</subject>
</contentMeta>
<!-- This assertion provides additional information about Beijing -->
<assert qcode="afplocation:2618">
<type qcode="loctyp:City"/>
<geoAreaDetails>
<position latitude="39.9075" longitude="116.39723"/>
</geoAreaDetails>
</assert>
<!-- This assertion provides additional information about Paris -->
<assert qcode="afplocation:2500">
<type qcode="loctyp:City"/>
<broader qcode="afplocation:67" type="loctyp:Country">
<name>France</name>
<related qcode="iso3166-1a3:FRA" rel="skos:exactMatch"/>
</related>
<geoAreaDetails>
<position latitude="48.85341" longitude="2.3488"/>
</geoAreaDetails>
</assert>
<!-- This assertion provides additional information about Manosque -->
<assert uri="geo:43.82883,5.78688">
<type qcode="loctyp:City"/>
<geoAreaDetails>
<position latitude="43.82883" longitude="5.78688"/>
</geoAreaDetails>
</assert>
</newsItem>
</itemSet>
</newsMessage>
Locations that are subject matter of the document may be provided by subject
elements. Note that other entities such as persons, media topics, organizations and so on may also be conveyed using subject
elements. To differentiate them, a type
attribute is used. Its value, a Qcode, is either cpnat:geoArea
(resolving to http://cv.iptc.org/newscodes/cpnature/geoArea
) or cpnat:poi
(resolving to http://cv.iptc.org/newscodes/cpnature/poi
). All these subjects share some common properties, such as optional type
and afp:role
attributes that are described in the section on subjects.
Additional information about these locations may be provided by assertions; an assertion is represented by an assert
element. You can correlate assertions with specific locations using their concept URIs: the information provided by an assertion applies to the location whose concept URI is conveyed by the qcode
or the uri
attribute of the assertion. In the example above, a subject
element whose qcode resolves to http://ref.afp.com/locations/2618
(in AFP documents, afplocation
is a scheme alias for http://ref.afp.com/locations/
). We also have an assert
element whose qcode resolves to http://ref.afp.com/locations/2618
. It means that both this subject and this assertion convey information about the same location.
If your don't perform QCode resolution (cf. section on controlled vocabularies and qualified codes) then you can correlate QCode-based assertions with specific locations using their QCodes directly.
A given assertion may convey several informations about a location:
qcode
attribute or an uri
attribute. As discussed above it is used to correlate the assertion with a location.type
attribute. In AFP documents we typically make use of the IPTC location types [IPTCLocTypes] to specify whether the location is a city, a country area or a country. For locations that are classified as "point of interest", we use the QCode cpnat:poi
(concept URI: http://cv.iptc.org/newscodes/cpnature/poi
) from [IPTCCPNatures].position
element inside a geoAreaDetails
. Unless specified otherwise by a gpsdatum
attribute, we use the WGS84 geodesic system.broader
element. Typically the broader entity we provide is a country. This broader geographical entity may be described with: qcode
attribute.type
attribute. We make use of the IPTC location types [IPTCLocTypes] to specify whether it is a city, a country area or a country.name
element.related
element whose rel
attribute, the QCode skos:exactMatch
, resolves to http://www.w3.org/2004/02/skos/core#exactMatch
and with a qcode
attribute using the scheme alias iso3166-1a3
(scheme URI: http://cvx.iptc.org/iso3166-1a3/
). The ISO 3166-1 alpha 3 code is the code part of this qcode
attribute.related
element whose rel
attribute, the QCode skos:broader
, resolves to http://www.w3.org/2004/02/skos/core#broader
.Locations of the event(s)
Some locations that are subject matter of the document also happen to be locations of event(s). A location of event is a place where an event the document is about happens or is foreseen to happen. Locations of event(s) are provided by subject
elements with an attribute role
in namespace http://www.afp.com/format/internal/
equal to http://cv.afp.com/subjectroles/locationOfEvent
.
For example, in our document about the auction of the Pierre Bergé and Yves Saint-Laurent collection, we could have the city of Paris as a subject because the news story mentions that the auction takes place in Paris. We could also have the city of Beijing as a subject because the news story mentions China's claims that some objects in the auction were stolen in Beijing during the opium wars and therefore should be returned. In this case, both cities would appear in dedicated subject elements. The city of Paris could be tagged as being a location of event using the role attribute because the auction happens in Paris and in our example the auction is the event the story is about. Beijing would not be tagged as being a location of event because while it is a subject of the story it is not a location of the event the story is about.
There is no default value for the role
attribute: if a subject
element conveying a location does not have a role attribute with a value of http://cv.afp.com/subjectroles/locationOfEvent
, it doesn't mean that it isn't a location of the event, but merely that the information regarding this matter isn't provided by the element.
All documents: products the document belongs to may be provided in the header of the news message.
<newsMessage>
<header>
<afp:headerExtension xmlns:afp="http://www.afp.com/format/internal/">
<!-- The document belongs to this product -->
<afp:product name="EAA" uri="http://products.afp.com/wires/EAA"></afp:product>
<!-- The document also belongs to this other product -->
<afp:product name="MAX" uri="http://products.afp.com/wires/MAX"></afp:product>
</afp:headerExtension>
</header>
</newsMessage>
The commercial relationship between AFP and its clients is often structured around the notion of product. A product is a subset of AFP's production a client can subscribe to. Each product is defined by several characteristics such as subject matters, media types, languages, etc.
The product
elements, if present, are provided in the headerExtension
inside the header
of the newsMessage
. The headerExtension
element is an AFP specific extension and is defined in namespace http://www.afp.com/format/internal/
.
Each product
element identifies a product the document belongs to. It can be a product you have subscribed to but it is not necessarily the case: typically, all products the document
belongs to are listed regardless of your specific subscriptions.
In your information system, a possible usage of the product
elements is to automatically route documents to specific teams or workflows. For example you might want to automatically route documents of the "Economic & Business News" product to your economics specialists.
Each product is uniquely identified by an URI, provided by the uri
attribute. You can ask your AFP representative for the URIs of the products you have subscribed to.
The name
attribute provides the name of the product, meant to be used for display purpose.
The following table provides examples of products.
Examples of products | ||
---|---|---|
Name | Unique identifier | Description |
EAA | http://products.afp.com/wires/EAA |
The World News (EAA) wire offers up-to-the-minute, complete English-language global news, sports and business coverage delivered specifically to suit the needs of clients in Europe, Africa and the Middle East. EAA also provides in-depth coverage of Europe for Europe. |
MAX | http://products.afp.com/wires/MAX |
The world news wire, MAX, carries AFP's entire English-language news production and is designed specifically for clients who demand comprehensive global coverage. |
FRS | http://public.products.afp.com/wires/FRS |
The FRS wire is the AFP news feed mainly for French customers. This feed in French-language offers French and foreign sources of information on varied topics (general news, politics , economy, culture , social, sport and equestrian ), with emphasis on in-depth coverage of France. |
DAB | http://public.products.afp.com/wires/DAB |
The DAB wire in French language is designed primarily for African customers. Produced in Paris by a specialized desk, which processes and translates the information gathered by the largest networks of all international agencies active in Africa, it is also powered by the four other regional centers of AFP (Hong Kong, Nicosia, Washington and Montevideo) to provide comprehensive coverage of world news round the clock and seven days a week. |
Text, picture, still graphic, video and multimedia documents: the provider of the document is given in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<provider qcode="afpprovider:AFP-TV">
<name>AFP-TV</name>
<broader qcode="nprov:AFP"/>
<name>AFP</name>
</broader>
</provider>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the provider of the document is given in the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<provider qcode="afpprovider:AFP-TV">
<name>AFP-TV</name>
<broader qcode="nprov:AFP"/>
<name>AFP</name>
</broader>
</provider>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
The provider of a document is the party responsible for the management and the release of the document (i.e., the publisher of the document). It is given by the qcode
attribute of the provider
element. This element is always present. The QCode is part of one of the following schemes:
http://cv.iptc.org/newscodes/newsprovider/
and whose scheme alias is nprov
.http://ref.afp.com/providers/
and whose scheme alias is afpprovider
.The name
child element, if present, provides a natural language name for the provider.
The broader
child element, if present, specifies a larger entity the provider is part of. This entity is identified by a qcode
attribute, optionally completed by a natural language name in a name
element.
In the example above, the document is provided by AFP-TV, a service inside AFP. The fact that this provider is part of AFP is expressed using the broader
element.
Text, picture, still graphic, video and multimedia documents: the publishing status is provided by the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ specifying the publishing status"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the publishing status is provided by the item metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<pubStatus qcode="QCode of scheme http://cv.iptc.org/newscodes/pubstatusg2/ specifying the publishing status"/>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
A document can be usable, withheld or canceled. The table below describes how this is specified in documents and what it means.
Publishing statuses | ||
---|---|---|
Status | Representation | Meaning |
Usable | No pubStatus element or a pubStatus element with a qcode attribute stat:usable resolving to http://cv.iptc.org/newscodes/pubstatusg2/usable |
The document is usable. Note that "usable" does not necessarily means "publishable"; for example an embargo may prevent publication of an otherwise usable document. |
Withheld | A pubStatus element with a qcode attribute stat:withheld resolving to http://cv.iptc.org/newscodes/pubstatusg2/withheld |
The document and all its previous versions must not be used until further notice (except for a few metadata, as described bellow). This status is typically used when a serious problem with a document is suspected and is under investigation (e.g., important information in the document is suspected to be false). In the meantime, any usage of the document must be prohibited, if needed by the way of alerts. If the document has been published it must be rendered inaccessible until further notice. You must immediately remove it from all your online services and stop using it in any other fashion. People that may have viewed previous versions should be notified, whenever possible, that it is being retracted until further notice. If you have been authorized by AFP to distribute it to third parties, you must ensure that the same actions are carried out by them. In a withheld document, only the following metadata can be considered reliable/useable: GUID, version number, publication status, general editorial note (in this version of the document only). |
Canceled | A pubStatus element with a qcode attribute stat:canceled resolving to http://cv.iptc.org/newscodes/pubstatusg2/canceled |
The document and all its previous versions must not be used, ever (except for a few metadata, as described bellow). This status is typically used when a serious problem with a document is detected (e.g., important information in the document has been found to be false) and the scope of the problem is wide enough to warrant a complete kill of the document instead of issuing a correction. Any usage of the document must be prohibited, if needed by the way of alerts. If the document has been published it must be rendered inaccessible. You must immediately remove it from all your online services, stop using it in any other fashion and delete it from your servers. People that may have viewed previous versions should be notified, whenever possible, that it is being retracted. If you have been authorized by AFP to distribute it to third parties, you must ensure that the same actions are carried out by them. In a cancelled document, only the following metadata can be considered reliable/useable: GUID, version number, publication status, general editorial note (in this version of the document only) and cancel-dedicated rendition(s). A cancel-dedicated rendition is specifically designed to be used canceled documents, allowing to publish something (e.g., a note about the cancellation) replacing the canceled content . It is conveyed by an inlineXML or remoteContent element and denoted through the rendition attribute by the QCode afprnd:cancel , resolving to http://cv.afp.com/renditions/cancel . |
When a document is withheld or canceled, a general editorial note is often provided to provide additional information and/or instructions.
The NewsML-G2 specification provides detailed information on how you must make use of this publishing status when processing documents.
For multimedia documents, the way publishing status is conveyed differs from standard NewsML-G2. In NewsML-G2 each G2 item carries its own publishing status, and a G2 item without a pubStatus element is defined as usable. In AFP's multimedia documents the only pubStatus element to consider is those of the main item. The pubStatus elements of non main items must be ignored. You must process multimedia documents in a way that applies the publishing status provided in the main news item to the entire content of the document (i.e., to all items in the document). |
Text, picture, still graphic and video documents: subjects of the document may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A subject represented by a natural language name -->
<subject>
<name>auction</name>
</subject>
<!-- A subject represented by a QCode -->
<subject qcode="medtop:20000273"/>
<!-- A subject represented by a QCode and a natural language name -->
<subject qcode="medtop:01000000">
<name>arts, culture and entertainment</name>
</subject>
<!-- A subject represented by a QCode, a natural language name, a type and a role -->
<subject qcode="afplocation:2500" type="cpnat:geoArea" afp:role="http://cv.afp.com/subjectroles/locationOfEvent">
<name>Paris</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: subjects of the document as a whole may be provided in the content metadata section of the main news item. Subjects specific to an item may be provided in the content metadata section of this item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<!-- This subject is in the main news item:
it applies to the document as a whole -->
<subject qcode="medtop:20000031" type="cpnat:abstract">
<name>visual art</name>
</subject>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- This subject only applies to this news item -->
<subject qcode="medtop:20000011" type="cpnat:abstract">
<name>fashion</name>
</subject>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: subjects of the document may be provided in the content metadata section of the package item. In live reports document, subjects expressed using a controlled vocabulary are only media topics and event identifiers.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<!-- A subject represented by a natural language name -->
<subject>
<name>auction</name>
</subject>
<!-- A subject represented by a QCode -->
<subject qcode="medtop:20000273"/>
<!-- A subject represented by a QCode and a natural language name -->
<subject qcode="medtop:01000000">
<name>arts, culture and entertainment</name>
</subject>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
Subjects are important topics of the content; what the content is about. Some subjects of a document (and of individual items in the case of multimedia documents) may be provided by subject
elements. Each subject
element contains an indication on what the document's content (or item's content) is about.
Some subjects of the document may be described by keyword
elements instead of subject
elements. However, keywords may also be used for other purposes: while a keyword may describe a subject of the document, not all keywords do. See the Keywords section.
In AFP documents, a subject represented by a subject
element is specified by either:
qcode
attribute, optionally completed by a natural language name. For example, in AFP documents medtop
is a scheme alias for the scheme http://cv.iptc.org/newscodes/mediatopic/
, therefore the QCode medtop:20000011
shown above resolves to the URI http://cv.iptc.org/newscodes/mediatopic/20000011
, which identifies the media topic "fashion".uri
attribute, optionally completed by a natural language name. This is used for specifying some locations, using 'geo' URIs. Geo URIs are defined by [RFC5870]. They allows identifying locations and conveying information such as latitude, longitude and so on. For example the URI geo:13.4125,103.8667
identifies the location at latitude 13.4125 and longitude 103.8667 in WGS-84. At the time of this writing AFP documents make use of simple geo URIs with only latitude and longitude, but in the future we may use additional features (e.g., altitude, uncertainty, etc.). An example is provided in the section "Locations that are subject matter of the document".The URI space used to specify subjects through qcode
and uri
attributes is open and can evolve over time. Often used in AFP documents are QCodes identifying IPTC media topics [IPTCMediaTopics], a standard taxonomy for categorizing news content. Also often used are QCodes identifying events, in order to associate a document with the events it covers. The table below presents common schemes used in AFP documents to identify subjects. Note that this list is not exhaustive.
Common types of subjects used in AFP documents | |||
---|---|---|---|
Type | Scheme URI | Scheme alias | Comment |
Media topics | http://cv.iptc.org/newscodes/mediatopic/ |
medtop |
Media topics is a standard IPTC taxonomy for categorizing news content. For example the concept URI http://cv.iptc.org/newscodes/mediatopic/01000000 identifies the category "arts, culture and entertainment", which is defined as "Matters pertaining to the advancement and refinement of the human mind, of interests, skills, tastes and emotions". |
Events | http://eventmanager.afp.com/events/ |
afpevent |
An AFP specific scheme for identifying events. It is used to associate a document with the event it covers. For more on this topic see the section on event identifiers. |
Persons | http://ref.afp.com/persons/ |
afpperson |
AFP specific scheme for identifying persons. For example the concept URI http://ref.afp.com/persons/193573 identifies Pierre Bergé. |
Organizations | http://ref.afp.com/organizations/ |
afporganization |
AFP specific scheme for identifying organizations. For example the concept URI http://ref.afp.com/organizations/5308 identifies Christie's, the auction company. |
Locations | http://ref.afp.com/locations/ |
afplocation |
AFP specific scheme for identifying locations. For example the concept URI http://ref.afp.com/locations/2500 identifies the city of Paris. |
A subject
element can have a name
child element. If present it provides a natural language name for the subject.
In a given item, the order of appearance of subject
elements provides a hint about their relative importance (i.e., editorial significance) in the context of this item: a subject should be considered as having either the same or a lesser importance than subjects appearing before in the item. Note that while AFP's documents currently don't rank subjects with rank
attributes, that may change in the future. In order to be forward compatible, if your NewsML-G2 processor interprets such ranks, the relative importance they convey should take precedence over the relative importance conveyed by the order of appearance of subjects
elements in the item. The rank
attribute is described in the NewsML-G2 specification.
Optional attributes (these attributes may or may not be present in a given subject
element):
type: this attribute carries a QCode that specifies the type of the subject (i.e., person, organization, event, abstract concept, etc.). The value space for this attribute is open, but in AFP documents you'll typically find types defined in the standard IPTC "Nature of a concept" controlled vocabulary [IPTCCPNatures].
role (in namespace http://www.afp.com/format/internal/): some subjects have a specific role, which is conveyed by this attribute in the form of an URI. This attribute is not defined by the NewsML-G2 standard: it is an AFP specific extension and is therefore defined in a specific namespace.
Currently the only possible value for this attribute when it is present is http://cv.afp.com/subjectroles/locationOfEvent
. If a subject is tagged with this role then this subject is a location of the event(s) the editorial content is about. This usage is described in detail in the section "Locations that are subject matter of the document".
Documents may contain various types of titles and multiple levels of subtitles.
Note that while NewsML-G2 allows for rich text by using some markup in the content of titles and subtitles, AFP's systems only output simple textual content not interspersed with markup.
Text, picture, still graphic, video and multimedia documents: titles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- The main title of the document -->
<headline>
YSL-Bergé collection sets new world record at auction
for a private collection
</headline>
<!-- The short title of the document -->
<headline role="afpheadlinerole:shorttitle">
YSL-Bergé collection: a new record at auction
</headline>
<!-- The long title of the document -->
<headline role="afpheadlinerole::longtitle">
Yves Saint Laurent/Pierre Bergé collection sets new world record at
auction for a private collection with more than 206 million euros
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: A title may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<nackageItem>
<contentMeta>
<!-- The title of the live report -->
<headline>
YSL-Bergé auction live report
</headline>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
All documents may contain a title. In addition, text, picture, still graphic, animated graphics, video and multimedia documents may include a short title and/or a long title. These titles, if present, are provided by headline
elements located in the content metadata section of the first item. There is at most one title, one short title and one long title.
You can determine the type of a given title by looking for the presence and value of a role
attribute, as described in the following table.
Title types | ||
---|---|---|
Type | Function | Identification |
Title | The main title of the document: a short summary of the journalistic content. | No role attribute. |
Short title | A shorter version of the title, suitable for displaying on space constrained surfaces (e.g., mobile handsets). | A role attribute whose value, the QCode afpheadlinerole:shorttitle , resolves to http://cv.afp.com/headlineroles/shorttitle |
Long title | A longer version of the title. This is a short catch line, useful, for example, to display on a banner. | A role attribute whose value, the QCode afpheadlinerole:longtitle , resolves to http://cv.afp.com/headlineroles/longtitle |
Text and multimedia documents: subtitles may be provided in the content metadata section of the news item (for multimedia documents: in the main news item). Subtitles are only provided for text and multimedia documents.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<headline role="afpheadlinerole:subtitle" rank="0">
Auction to continue tuesday and wednesday
</headline>
<headline role="afpheadlinerole:subtitle" rank="1">
Prestigious attendance noted on first day
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
In addition to titles, text and multimedia documents may contain subtitles. Subtitles complement tittles with additional information about the news content of the document. In current production there is at most two subtitles. Like titles, they are provided by headline
elements in the content metadata section of the main news item. Their subtitle nature is denoted by a role
attribute whose value, the QCode afpheadlinerole:subtitle
, resolves to http://cv.afp.com/headlineroles/subtitle
. A rank
attribute may be present to specify the relative importance of subtitles. Ranks are nonnegative integers. Subtitles with a lower value for this attribute have a higher importance than subtitles with a higher value of this attribute, and subtitles without a rank attribute have a lower importance than subtitles with a rank attribute. See the NewsML-G2 specification for additional information on ranks and their processing model.
An AFP NewsML-G2 document can be of one of the following types:
The type of a NewsML-G2 document defines important characteristics of the document such as the nature of its content, its XML structure, the metadata it provides as well as some elements of its processing model.
The overview section provides a description of these types.
To determine the type of a document, you first need to determine if it is a multimedia or non-multimedia document. A document is multimedia if the item set of the news message contains a news item whose item metadata section contains a link
element with both:
rel
attribute whose value, the QCode crel:isa
, resolves to http://cv.iptc.org/newscodes/conceptrelation/isA
href
attribute whose value is the URI http://cv.afp.com/itemnatures/mmdMainComp
That is, a multimedia document contains the following:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
In a non-multimedia document, the type is the item class of the item present in the item set of the news message.
For Text, picture, still graphic, video and multimedia documents the item class is given by the qcode
attribute of the itemClass
element in the item metadata section of a news item, as shown here:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<itemClass qcode="QCode specifying the type"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
For live reports the item class is given by the qcode
attribute of the itemClass
element in the item metadata section of a package item, as shown here:
<newsMessage>
<itemSet>
<packageItem>
<itemMeta>
<itemClass qcode="QCode specifying the type"/>
</itemMeta>
</packageItem>
</itemSet>
</newsMessage>
The itemClass
element is always present. For non multimedia documents, it's qcode
attributes resolves to a concept URI that specifies the type of the document, as shown in the table below.
Item classes used in AFP document | ||
---|---|---|
Type | QCode | Concept URI |
Text | ninat:text |
http://cv.iptc.org/newscodes/ninature/text |
Picture | ninat:picture |
http://cv.iptc.org/newscodes/ninature/picture |
Video | ninat:video |
http://cv.iptc.org/newscodes/ninature/video |
Still graphic | ninat:graphic |
http://cv.iptc.org/newscodes/ninature/graphic |
Animated graphic | ninat:animated |
http://cv.iptc.org/newscodes/ninature/animated |
Interactive graphic | afpinat:interactive |
http://cv.afp.com/itemnatures/interactive |
Live report index | afpinat:liveReport |
http://cv.afp.com/itemnatures/liveReport |
The NewsML-G2 standard states that it is mandatory to use one of the IPTC News Item Nature NewsCodes schemes for item classes. AFP NewsML-G2 deviates from this rule by using an AFP specific scheme (whose URI is http://cv.afp.com/itemnatures/ ) in addition to the mandatory IPTC schemes. |
Text, picture, still graphic, animated graphic, video and multimedia documents: the urgency of the document may be provided in the content metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<urgency>1</urgency>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Live report indexes: the urgency of the document may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<urgency>1</urgency>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
A document may include an indication of the editorial urgency of its content in an urgency
element. The content of this element is an integer from 1 (highest urgency) to 9 (lowest urgency). Usually, AFP documents are tagged with urgencies from 1 to 4.
There is often a correlation between this property and the role in workflow of the document. In our documents, flashes are typically issued with the highest urgency (i.e., a value of 1) alerts with an urgency of 2 and urgents with an urgency of 3.
Some data appear only in text and multimedia documents. This section details these data elements.
Text documents: a catchline may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<headline role="afpheadlinerole:introduction">
The Yves Saint Laurent and Pierre Bergé collection sets new world record at
auction for a private collection on monday, the first day of a three action
days, with more than 206 million euros. Participants describe first day
as "surprising, moving, electric!".
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a catchline may be provided in the content metadata section of the main news item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentMeta>
<headline role="afpheadlinerole:catchline">
The Yves Saint Laurent and Pierre Bergé collection sets new world record at
auction for a private collection on monday, the first day of a three action
days, with more than 206 million euros. Participants describe first day
as "surprising, moving, electric!".
</headline>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A catch line, if present, provides a clear and concise summary of the story that tells the reader what has happened in simple language. It is designed to arouse or call viewer's attention. It gives an overview of all the main elements of the news. A catchline may be found at most once per document.
In text documents the catchline is provided by a headline
element whose role
attribute, the QCode afpheadlinerole:introduction
, resolves to http://cv.afp.com/headlineroles/introduction
. At the time of this writing a catchline may be provided only for text documents produced by SID (Sport-Informations-Dienst), an AFP subsidiary. To determine if the kind of text documents you are interested in might contain a catchline you are advised to discuss the matter with your AFP representative.
In multimedia documents the catchline is provided by a headline
element whose role
attribute is either afpheadlinerole:catchline
(resolving to http://cv.afp.com/headlineroles/catchline
) or afpheadlinerole:introduction
(resolving to http://cv.afp.com/headlineroles/introduction
).
While NewsML-G2 allows for rich text by using some markup in the content of a catch line, AFP's systems only output simple textual content not interspersed with markup.
In some documents you might observe that the content of the catchline is the same as the first paragraph of the main textual content of the document. Note however that this is not always the case and that sometimes an original catchline is provided.
Text documents and multimedia documents: the number of hypertext links to external resources present in textual or multimedia content may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<newsItem>
<itemMeta>
<afp:extension>
<afp:stats>
<afp:totalLinks>
3
</afp:totalLinks>
</afp:stats>
</afp:extension>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The HTML (in XML syntax) rendition of the textual or multimedia content can contain hypertext links to external resources, typically conveyed by <a>
elements. External resources are resources that are not intrinsically part of the document; for example, in a multimedia document a link to one of the item of the document isn't a link to an external resource whereas a link to a Wikipedia page is.
As shown in the example above this number may be provided as an integer by a totalLinks
element inside a stats
element inside an extension
element in the item metadata section of the (main) news item.
Note that the totalLinks
, stats
and extension
elements are not standard NewsML-G2 vocabulary but part of an AFP's specific extension. They are defined in an XML namespace whose name is http://www.afp.com/format/internal/
.
Text and multimedia documents: mentions of the existence of related production may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<!-- The following signals that AFP is publishing/will publish related photo and video production -->
<signal qcode="afpmedtype:Photo"/>
<signal qcode="afpmedtype:Video"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Text and multimedia documents may contain mentions of the existence of related production, i.e., additional production covering the event(s) the document is about. For example, if AFP has released or plan to release photo(s) and video(s) of the Yves Saint Laurent auction then it may be mentioned in the metadata of a text or multimedia news story covering this auction, as shown in the example above. To that end, we use signal
elements specifying which type of related production exists or is planned, using a controlled vocabulary defined by the scheme http://ref.afp.com/mediatypes/
(scheme alias: afpmedtype
).
We provide only one signal by type of related production. For example, if there are several related photos, there may be only one <signal qcode="afpmedtype:Photo"/>
element.
Note that signal
elements are also used for other purposes (e.g., correction signal). Only signal
elements in the scheme http://ref.afp.com/mediatypes/
are mentions of related production.
The table below provides the QCodes/concepts URIs that are used in these signal
elements. See the overview section for a descriptions of the various types of news content this table refers to.
Types of related production | ||
---|---|---|
Concept URI | QCode | Description |
http://ref.afp.com/mediatypes/Photo |
afpmedtype:Photo |
Related picture(s). For example, a picture of the Yves Saint Laurent auction. |
http://ref.afp.com/mediatypes/PHOTOARCH |
afpmedtype:PHOTOARCH |
Related picture(s) from archive material. It is typically an archive picture of someone or something that plays an important role in the event(s). For example an archive picture of Yves Saint Laurent, or an archive picture of Christie's salerooms. When this mention is used, the related archive pictures are republished by AFP. |
http://ref.afp.com/mediatypes/Video |
afpmedtype:Video |
Related video(s). For example, a video report about the Yves Saint Laurent auction. |
http://ref.afp.com/mediatypes/LIVEVIDEO |
afpmedtype:LIVEVIDEO |
Related video(s) providing live coverage. For example a video of the Yves Saint Laurent auction broadcasted live. |
http://ref.afp.com/mediatypes/VIDEOARCH |
afpmedtype:VIDEOARCH |
Related video(s) from archive material. It is typically an archive video of someone or something that plays an important role in the event(s). For example an archive video of Yves Saint Laurent, or an archive video of Christie's salerooms. When this mention is used, the related archive videos are republished by AFP. |
http://ref.afp.com/mediatypes/Sketch |
afpmedtype:Sketch |
Related courtroom sketch(s). A courtroom sketch is an artistic depiction of the proceedings in a court of law. In many jurisdictions, cameras are not allowed in courtrooms in order to prevent distractions and preserve privacy. Consequently we rely on sketch artists for illustrations of the proceedings. |
http://ref.afp.com/mediatypes/Graphic |
afpmedtype:Graphic |
Related still graphic(s). |
http://ref.afp.com/mediatypes/ANIGRAPHIC |
afpmedtype:ANIGRAPHIC |
Related interactive graphic(s). |
http://ref.afp.com/mediatypes/VIDEOGRAPHIC |
afpmedtype:VIDEOGRAPHIC |
Related videographic(s). |
http://ref.afp.com/mediatypes/Multimedia |
afpmedtype:Multimedia |
Related multimedia document(s). |
http://ref.afp.com/mediatypes/LIVEREPORT |
afpmedtype:LIVEREPORT |
Related live report(s). |
http://ref.afp.com/mediatypes/INTERACTIVEGRAPHIC |
afpmedtype:INTERACTIVEGRAPHIC |
Related interactive graphic(s). |
The mechanism described in this section is not the only one to deal with related production. As described in the section on event identifiers, we also provide you with correlation keys allowing you to identify documents covering the same events.
Text and multimedia documents: a role in workflow may be provided in the item metadata section of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<role qcode="QCode specifying the role in workflow"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
Some text and multimedia documents carry an indication of their role in workflow (aka editorial role). This allows you to handle them in specific ways. This role, if present, is specified by the qcode
attribute of the role
element. The possible values for the role are taken from a controlled vocabulary provided by the IPTC (we do not use its whole value space, though). They are described in the table below, where the Concept URI column gives the URI the QCode resolves to.
Roles in workflow | |||
---|---|---|---|
Role | Description | QCode | Concept URI |
Flash | A very short text – typically four or five words – on an event of exceptional importance. Flashes are rare. For example, only four events were reported by AFP by a flash in 2008 : Kosovo’s declaration of independence; the opening of the Beijing Games; Russia’s recognition of South Ossetia and Abkhazia as independent states; and Barack Obama’s victory in the US presidential elections. A flash is usually followed within five minutes by an urgent providing more information |
erol:flash |
http://cv.iptc.org/newscodes/edrole/flash |
Alert | A very short text with high priority. An alert is usually followed within five minutes by an urgent providing more information. Fits in a single line. | erol:alert |
http://cv.iptc.org/newscodes/edrole/alert |
Urgent | A short text on a major development of a top story. An urgent is typically two paragraph long, or longer when it provides a follow-up to multiple alerts. On a freshly breaking story, an urgent is typically followed within 10 minutes by a 200-250 word lead. | erol:urgent |
http://cv.iptc.org/newscodes/edrole/urgent |
Lead | A sum-up or a complete version of a developing story. | erol:lead |
http://cv.iptc.org/newscodes/edrole/lead |
When a document is updated, its role in workflow may be updated too. For example it is typical for a breaking news that deserves immediate diffusion to starts its life as an alert, then becomes an urgent, then a lead, as it gets refreshed/enriched with more content. Each version of the document share the same guid (see the section on identifiers).
Evolution over time of a developing story
Once a document is a lead, subsequent versions may be qualified as "second lead", "third lead" and so on up to a "ninth lead". However, this qualification is not done through the role in workflow property: this property use the same concept URI of http://cv.iptc.org/newscodes/edrole/lead
(QCode erol:lead
) from the first lead through the ninth one. To convey what kind of lead the document is, we use a <genre>
element (see the section on genres). For example, we typically convey that a document is a first lead by specifying a role in workflow with the concept URI http://cv.iptc.org/newscodes/edrole/lead
and a genre with the concept URI http://ref.afp.com/editorialtypes/Lead
(QCode afpedtype:Lead
), as in the following example:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<role qcode="erol:lead"/>
</itemMeta>
<contentMeta>
<genre qcode="afpedtype:Lead" />
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
For a second lead, the role in workflow is still http://cv.iptc.org/newscodes/edrole/lead
and a genre with a concept URI of http://ref.afp.com/editorialtypes/2ndlead
(QCode afpedtype:2ndlead
) is provided:
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<role qcode="erol:lead"/>
</itemMeta>
<contentMeta>
<genre qcode="afpedtype:2ndlead" />
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A document with a role in workflow of "lead" can also be qualified by the genre "general lead", whose meaning is described at the end of the table below. Typically a general lead has a different guid than the various documents it consolidates. A document cannot be both a general lead and "first lead" or "second lead" etc.
The following table describes the various genres used to qualify a lead.
Genres used to qualify a lead | |||
---|---|---|---|
Genre | Description | QCode | Concept URI |
Lead (typically used to mean "first lead") | A sum-up or a complete version of a developing story | afpedtype:Lead |
http://ref.afp.com/editorialtypes/Lead |
Second lead | A sum-up or a complete version of a developing story. For a given story, common usage is that a second lead is published only if a lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:2ndlead |
http://ref.afp.com/editorialtypes/2ndlead |
Third lead | A sum-up or a complete version of a developing story. For a given story, common usage is that a third lead is published only if a second lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:3rdlead |
http://ref.afp.com/editorialtypes/3rdlead |
Fourth lead | A sum-up or a complete version of a story. For a given story, common usage is that a fourth lead is published only if a third lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:4thlead |
http://ref.afp.com/editorialtypes/4thlead |
Fifth lead | A sum-up or a complete version of a story. For a given story, common usage is that a fifth lead is published only if a fourth lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:5thlead |
http://ref.afp.com/editorialtypes/5thlead |
Sixth lead | A sum-up or a complete version of a developing story. For a given story, common usage is that a sixth lead is published only if a fifth lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:6thlead |
http://ref.afp.com/editorialtypes/6thlead |
Seventh lead | A sum-up or a complete version of a developing story. For a given story, common usage is that a seventh lead is published only if a sixth lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:7thlead |
http://ref.afp.com/editorialtypes/7thlead |
Eighth lead | A sum-up or a complete version of a developing story. For a given story, common usage is that a eighth lead is published only if a seventh lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:8thlead |
http://ref.afp.com/editorialtypes/8thlead |
Ninth lead | A sum-up or a complete version of a developing story. For a given story, common usage is that a ninth lead is published only if a eighth lead is already out. It provides a refreshed and/or enriched version of that story. | afpedtype:9thlead |
http://ref.afp.com/editorialtypes/9thlead |
General lead | A large sum-up or a complete version of a story. A general lead regroups, hierarchizes and develops all available elements of a developing story, including elements that were previously published under a number of different documents, each one focusing on specific facets of the more general story. | afpedtype:LeadGeneral |
http://ref.afp.com/editorialtypes/LeadGeneral |
Text and multimedia documents: the word count is provided in the inline XML rendition of the content of the news item (for multimedia documents: in the main news item).
<newsMessage>
<itemSet>
<newsItem>
<contentSet>
<inlineXML wordcount="450">
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
The word count gives an approximation of the size of the textual content of the document (not including textual content provided in metadata). That size is provided as an approximative count of words: when it is computed, each individual word might not count for one as short words count for less than one and long words count for more than one.
The word count is provided by the wordcount
attribute of the inlineXML
element of the news item. It is a non-negative integer. It is present in all text and multimedia documents.
Some data is specific to text documents. This section details these data elements.
Text documents: the textual content is provided in the content set of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentSet>
<inlineXML contenttype="application/xhtml+xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
YSL-Bergé collection sets new world record at auction
for a private collection
</title>
</head>
<body>
<p>The Yves Saint Laurent and Pierre Bergé collection sets
new world record at auction for a private collection.
Hundreds of art treasures amassed by late fashion designer
Yves Saint Laurent and his companion Pierre Berge over half
a century are being auctioned.</p>
<p>Bids hit 206 million euros (261 million dollars) on February
23, 2009 making it the biggest private collection ever
auctioned with two days of sales still left to run.</p>
...
...
<!-- An hypertext link -->
The <a class="ignorableTextFalse" href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
wikipedia page about Yves Saint-Laurent</a> claims that ...
...
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
The textual content of the document is the main journalistic text of the document. It is provided by an inlineXML
element. It is expressed using the XML syntax of HTML. This is explicitly denoted by a contentType
attribute with a value of application/xhtml+xml
.
The textual content can also contain links to entities that aren't logically part of the document, such as other NewsML-G2 documents, Web pages (as shown in the example above), etc. The sections below describe how these link are represented.
Note that text items of multimedia documents can also contain similar data, but with additional information such as links to visual content. This is described in section "Data specific to multimedia documents".
The HTML can contain hypertext links to other resources such as Web pages. They may be provided by a
elements. For example here is a link to a wikipedia page:
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)" >wikipedia page about Yves Saint-Laurent</a>
The class
attribute, if present, may be used to specify either the class name "ignorableTextFalse
" or "ignorableTextTrue
". These class names are meant to assist you if you need to remove hypertext links from the HTML content (this is a common need for some of our clients).
ignorableTextFalse
means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the hypertext links:
Pierre Bergé quoted the
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>
to illustrate...
After removing hypertext links the fragment should be:
Pierre Bergé quoted the wikipedia page about Yves Saint-Laurent to illustrate...
ignorableTextTrue
means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the hypertext links :
Some text before.
<a class="ignorableTextTrue"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
This Web page provides additional information.
</a>
Some text after.
After removing hypertext links the fragment should be:
Some text before. Some text after.
The HTML can contain links to other NewsML-G2 documents managed by AFP. Such links are associated with a part of the textual content. We represent these links using the g2document microformat. It consists in a span
element with a class
attribute that contains "g2document
". In addition, we provide another class name denoting the type of the referenced document: "g2picture
", "g2video
", etc. Finally, we may provide a class name that provides a hint on how a link could be removed gracefully. For example:
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
some text
</span>
The content of the span
element is organized as follow:
a
tag whose href
attribute provides the GUID of the NewsML-G2 document. Note that while it may look like a dereferencable URI, it actually isn't. This element is marked as non displayable as it is not meant to be directly displayed.a
tag may provide the dereferencable URI reference of the NewsML-G2 document. Typically, this element will be present if the AFP delivery system determines that it has delivered the corresponding document to you and know where to locate it in your delivery space.The following table lists the class names used to specify the type of a referenced NewsML-G2 document. See the overview section for a presentation of the various document types.
Types of referenced NewsML-G2 document | ||
---|---|---|
Class name | Type | |
g2text | Text | |
g2multimedia | Multimedia | |
g2picture | Picture | |
g2graphic | Still graphic | |
g2animated | Animated graphic | |
g2video | Video | |
g2liveReport | Live report index | |
g2interactive | Interactive graphic |
The class
attribute may also be used to specify "ignorableTextFalse
" or "ignorableTextTrue
". These class names are meant to assist you if you need to remove links from the HTML content (this is a common need for some of our clients).
ignorableTextFalse
means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Pierre Bergé quoted
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
a recent AFP news story
</span>
to illustrate...
After removing links the fragment should be:
Pierre Bergé quoted a recent AFP news story to illustrate...
ignorableTextTrue
means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Some text before.
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
This AFP news story provides additional information.
</span>
Some text after.
After removing links the fragment should be:
Some text before. Some text after.
Some data is associated with visual content. It may be present in picture, video, still graphic and animated graphic documents. It may also be present in picture, video, still graphic and animated graphic items of multimedia documents. This section details these data elements.
Picture, video, still graphic, animated graphic documents: a caption may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:contentDescription">
French businessman and head of Sidaction organisation Pierre Berge
attends at Marigny theater in Paris.
</description>
<description role="afpdescRole:contextDescription">
This is the first of the four auction days led by Christie's of
Yves Saint-Laurent and Pierre Berge collection, which profit will
fund campaigns against HIV-AIDS.
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a caption may be provided in the content metadata section of each news item conveying picture, video, still graphic or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- Caption for the content of this item -->
<description role="afpdescRole:contentDescription">
French businessman and head of Sidaction organisation Pierre Berge
attends at Marigny theater in Paris. This is the first of the four auction days led by Christie's of
Yves Saint-Laurent and Pierre Berge collection, which profit will
fund campaigns against HIV-AIDS.
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- Caption for the content of this other item -->
<description role="afpdescRole:contentDescription">
Christie's auctioneer François de Ricqles proceeds with the auction
of a rabbit head, a Chinese imperial bronze on February 25, 2009
at the Grand Palais in Paris. This object is part of a prized art collection assembled by
Yves Saint Laurent and his partner Pierre Berge over half a
century. One of the world's great private collections, it takes
in masterpieces by Picasso, Mondrian and Matisse, old masters, Art
Deco gems, bronzes, enamels and antiques. Two looted Chinese bronzes
sold for 15.7 million euros (20.3 million dollars) each to anonymous
telephone bidders at the Yves Saint Laurent art sale on Wednesday,
despite protests from Beijing.
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
In picture, video, still graphic or animated graphic documents, the caption, if present, is provided in two parts. The content description is a concise textual descriptions of what is shown in the visual content. The context description provides background information (e.g., context, meaning, etc.) about what is shown.
The content description may be provided in the associated news item by a description
element whose role
attribute, the QCode afpdescRole:contentDescription
, resolves to http://cv.afp.com/descriptionRoles/contentDescription
. The context description may be provided by a description
element whose role
attribute, the QCode afpdescRole:contextDescription
, resolves to http://cv.afp.com/descriptionRoles/contextDescription
.
In Multimédia document, the captions of visual components are in one part, as shown in the example above.
There is no caption for text content. In picture, video, still graphic and animated graphic documents, there is a single news item, which, consequently, is the one that may provide a caption. For multimedia documents, the caption of each picture, video, still graphic and animated graphic may appear in each corresponding news item. There is at most one caption per news item.
Note that while NewsML-G2 allows for rich text by using some markup in the content of a caption, AFP's systems only output simple textual content not interspersed with markup.
From time to time the AFP NewsML-G2 format evolves, but you may still want to correctly process older documents that make use of previous versions of the format. In older documents, captions are represented in a different way. In some documents the content description may be provided in the associated news item by a In even older documents, the content description and context description may not be provided as separate elements but instead in a single |
Picture, video, still graphic, animated graphic documents: a copyright notice may be provided in the rights information of the news item.
<newsMessage>
<itemSet>
<newsItem>
<rightsInfo>
<copyrightNotice>Copyright AFP or licensors</copyrightNotice>
</rightsInfo>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a copyright notice may be provided in the rights information of each news item conveying picture, video, still graphic or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<!-- A copyright notice for this item -->
<rightsInfo>
<copyrightNotice>Copyright AFP or licensors</copyrightNotice>
</rightsInfo>
</newsItem>
<newsItem>
<contentMeta>
<!-- A copyright notice for this item -->
<rightsInfo>
<copyrightNotice>Copyright AFP or licensors</copyrightNotice>
</rightsInfo>
</newsItem>
</itemSet>
</newsMessage>
Note that while NewsML-G2 allows for rich text by using some markup in the content of a copyright notice, AFP's systems only output simple textual content not interspersed with markup.
Picture, video, still graphic, animated graphic documents: one or multiple links to visual content may be provided in the content set of the news item.
<newsMessage>
<itemSet>
<!-- A visual item with three different renditions of the same visual content -->
<newsItem>
<contentSet>
<remoteContent href="pictureItem/image1.jpg"/>
<remoteContent href="pictureItem/image2.jpg"/>
<remoteContent href="ftp://example.com/image3.gif"/>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: one or multiple links to visual content may be provided in the content set of each news item conveying picture, video, still graphic or animated graphic content.
<newsMessage>
<itemSet>
<!-- A visual item with three different renditions of the same visual content -->
<newsItem>
<contentSet>
<remoteContent href="pictureItem/image1.jpg"/>
<remoteContent href="pictureItem/image2.jpg"/>
<remoteContent href="ftp://example.com/image3.gif"/>
</contentSet>
</newsItem>
<!-- Another visual item with two rendition of some other visual content -->
<newsItem>
<contentSet>
<remoteContent href="videoItem/video1.mp4"/>
<remoteContent href="http://example.com/video2.mp4"/>
</contentSet>
</newsItem>
</itemSet>
</newsMessage>
Links to the actual visual content (e.g., bitmaps, vector graphics, video frames, etc.) are provided by href
attributes of remoteContent
elements. The value of each href
attribute is an URI reference (while NewsML-G2 allows for IRI references, AFP NewsML-G2 documents use only URI references). See section "Accessing visual content through URI references" for additional directions on how to use these links.
Each picture, video, still graphic and animated graphic news item carries information one visual content (i.e., one picture, video or graphic). However, this content may be available in multiple renditions (e.g., low resolution, high resolution, JPEG format, TIFF format, etc.). Each rendition is described by a remoteContent
element in the content set of the item.
In standard NewsML-G2 "Each rendition [in the content set of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format. [Renditions in the content set of a given news item are] different technical representations of the same logical content". AFP renditions for picture and graphic content do not always abide by this rule: in addition to providing different technical representations of the same logical content, our renditions may also consist in crops or other alterations of the content provided by other renditions of the same news item. |
For each rendition, some information may be provided by attributes on remoteContent
elements. These attributes are described below.
To aid selecting renditions, the type of a rendition may be provided by a rendition
attribute in the remoteContent
element describing the rendition, as in this example:
<!-- Three description of renditions of different types -->
<remoteContent rendition="rnd:lowRes" href="pictureItem/image1.jpg"/>
<remoteContent rendition="rnd:highRes" href="pictureItem/image2.jpg"/>
<remoteContent rendition="rnd:thumbnail" href="pictureItem/image3.gif"/>
At the time of writing, some remoteContent
elements may be delivered with no rendition
attribute. For instance, this is the case for renditions in postscript or pdf format for still graphics, but they will have a contenttype
attribute identifying the format, as detailled in the section about rendition formats).
The rendition
attribute provides a QCode whose possible values are taken from an IPTC controlled vocabulary and from AFP controlled vocabularies. The following tables provide examples of such values.
Examples of rendition types for picture documents | ||
---|---|---|
Concept URI | QCode | Description |
http://cv.iptc.org/newscodes/rendition/highRes |
rnd:highRes |
High resolution image |
http://cv.iptc.org/newscodes/rendition/preview |
rnd:preview |
Preview resolution image |
http://cv.iptc.org/newscodes/rendition/thumbnail |
rnd:thumbnail |
A very small rendition of an image, giving only a general idea of its content |
Examples of rendition types for still graphic documents | ||
---|---|---|
Concept URI | QCode | Description |
http://cv.afp.com/renditions/AIcs11 |
afprnd:AIcs11 |
Rendition in Adobe Creative Suite 11 format |
http://cv.iptc.org/newscodes/rendition/highRes |
rnd:highRes |
High resolution image |
http://cv.afp.com/renditions/jpeg_retina |
afprnd:jpeg_retina |
A JPEG image in retina resolution. Typically, it contains four times more pixels than the jpeg_standard rendition. |
http://cv.afp.com/renditions/jpeg_standard |
afprnd:jpeg_standard |
A JPEG image in standard resolution |
http://cv.afp.com/renditions/png_retina |
afprnd:png_retina |
A PNG image in retina resolution. Typically, it contains four times more pixels than the png_standard rendition. |
http://cv.afp.com/renditions/png_standard |
afprnd:png_standard |
A PNG image in standard resolution |
http://cv.iptc.org/newscodes/rendition/preview |
rnd:preview |
Preview resolution image |
http://cv.iptc.org/newscodes/rendition/thumbnail |
rnd:thumbnail |
A very small rendition of an image, giving only a general idea of its content |
Examples of rendition types for visual components in multimedia documents | ||
---|---|---|
Concept URI | QCode | Description |
http://cv.iptc.org/newscodes/rendition/fullSize |
afprnd:fullSize |
Documentation forthcoming |
http://cv.afp.com/renditions/highDef |
afprnd:highDef |
Rendition of the highest definition of a visual component in a multimedia document |
http://cv.afp.com/renditions/ipad |
afprnd:ipad |
Content intended to appear on iPad |
http://cv.iptc.org/newscodes/rendition/mobile |
rnd:mobile |
Content intended to appear on a mobile or handheld device |
http://cv.afp.com/renditions/squaredThumbnail |
afprnd:squaredThumbnail |
A small squared rendition of an image |
http://cv.iptc.org/newscodes/rendition/thumbnail |
rnd:thumbnail |
A very small rendition of an image, giving only a general idea of its content |
http://cv.iptc.org/newscodes/rendition/web |
rnd:web |
Content intended to appear on a web page |
Examples of renditions types for interactive documents | ||
---|---|---|
Concept URI | QCode | Description |
http://cv.afp.com/renditions/png_standard |
afprnd:interactive |
The interactive rendition |
The media type of a rendition may be provided by a contenttype
attribute on the remoteContent
element describing the rendition, as in this example:
<!-- Three description of renditions, each one with a media type -->
<remoteContent contenttype="image/jpeg" href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif" href="pictureItem/image3.gif"/>
The value of the contenttype
attribute is a IANA MIME media type name [MediaTypes].
The contenttype
attribute may be complemented by a format
attribute to refine information about the data format of the rendition. For example:
<!-- Three descriptions of renditions, each one with a media type complemented by a format -->
<remoteContent contenttype="image/jpeg" format="example:JPEG_Baseline"
href="pictureItem/image1.jpg"/>
<remoteContent contenttype="image/jpeg" format="example:JPEG_Progressive"
href="pictureItem/image2.jpg"/>
<remoteContent contenttype="image/gif" format="example:GIF87a"
href="pictureItem/image3.gif"/>
The width and height of a rendition may be provided by width
and height
attributes (whose values are non-negative integers) on the remoteContent
element describing the rendition. The units in which these dimensions are expressed may be provided by widthunit
and heightunit
attributes. These attributes provide QCodes whose possible values are in the controlled vocabulary defined by IPTC for dimension units (cf. [IPTCDimUnits]). For example:
<remoteContent width ="640" widthunit ="dimensionunit:pixels"
height="400" heightunit="dimensionunit:pixels" href="pictureItem/image1.jpg"/>
This fragment states that the visual content at images/image1.jpg
is 640 pixels width and 400 pixels height (in this example, we suppose that dimensionunit
is a scheme alias for the controlled vocabulary defined by IPTC for dimension units).
The possible dimension units are a subset of the IPTC dimension units controlled vocabulary. They are provided in the table below, where the "Concept URI" column gives the URI to which the heightunit
and/or widthunit
attributes resolve.
Dimension units | ||
---|---|---|
Unit | QCode | Concept URI |
Pixel | dimensionunit:pixels |
http://cv.iptc.org/newscodes/dimensionunit/pixels |
Typographic Point | dimensionunit:points |
http://cv.iptc.org/newscodes/dimensionunit/points |
Millimeter | dimensionunit:mm |
http://cv.iptc.org/newscodes/dimensionunit/mm |
If a width
and/or a height
attribute is present but the corresponding dimension unit attribute is missing, then you must assume that the width and/or height is expressed in the default unit for that dimension. The default dimension units, which are specified by NewsML-G2, are given in the table below.
Default dimension units | ||
---|---|---|
Type of visual content | Default height unit | Default width unit |
Picture | pixels | pixels |
Graphic (still or animated) | points | points |
Digital video | pixels | pixels |
The size in bytes of a rendition may be provided by a size
attribute on the remoteContent
element describing the rendition, as in this example:
<remoteContent size="253476" href="pictureItem/image1.jpg"/>
In this example, the size
attribute asserts that the representation of the resource identified by images/image1.jpg
weight 253476 bytes.
The value of the size
attribute is a non-negative integer.
Some data is only present in picture and still graphic documents, and in picture and still graphic items of multimedia documents. This section describes these data elements.
Note that picture and still graphic documents/items also contains data common to visual content (see section "Data specific to visual content") and, of course, data common to all kind of content (see section "Common data").
As described in the section "Visual content", a given visual may have multiple renditions, each one described by a remoteContent
element. This section describes additional data that may be used to describe a picture or still graphic rendition.
The "orientation" of a rendition is an indication of orientation change from the original digital image. It may be provided by an orientation
attribute on the remoteContent
element describing the rendition. The value of this attribute is an integer in the range of 1 to 8 (inclusive). For example:
<remoteContent orientation="5" href="pictureItem/image1.jpg"/>
This fragment states that the image at pictureItem/image1.jpg
has been flipped about the vertical axis and rotated 90 degrees counterclockwise with regard to the original image. See the NewsML-G2 specification for a comprehensive description of the meaning of each value.
If no orientation
attribute is present, you should assume a value of 1, which means "upright, no flip, no rotation" (i.e., the visual top of the original image is at the top, the visual left side of the original image in on the left, etc.)
Small illustration images may be provided as part of the content set through remotecContent
elements, just like other renditions.
They are distinguished by the value of their rendition
attribute; e.g., http://cv.iptc.org/newscodes/rendition/thumbnail, http://cv.afp.com/renditions/squaredThumbnail. See the section on visual content for detailed information.
Note that illustration images for video or animated graphics are provided through a different way, as described in the section on icons.
Some data is only present in video and animated graphic documents, and in video and and animated graphic items of multimedia documents. This section describes these data elements.
Note that video and animated graphic documents/items also contains data common to visual content (see section "Data specific to visual content") and, of course, data common to all kind of content (see section "Common data").
As described in the section "Visual content", a given visual may have multiple renditions, each one described by a remoteContent
element. This section describes additional data that may be used to describe a video and animated graphic rendition.
The duration of a rendition may be provided by a duration
attribute (a non-negative integer) on the remoteContent
element describing the rendition. The unit in which the duration is expressed may be provided by a durationunit
attribute. This attribute provides a QCode whose possible values are in a subset of the controlled vocabulary for time units defined by IPTC [IPTCTimeUnits]. For example:
<remoteContent duration="120" durationunit="timeunit:seconds"
href="http://example.com/video2.mp4"/>
This fragment states that the content at http://example.com/video2.mp4
lasts 120 seconds (in this example, we suppose that timeunit
is a scheme alias for the controlled vocabulary defined by IPTC for time units).
Possible time units are given in the table below, where the "Concept URI" column gives the concept URI to which the QCode provided by durationunit
resolves.
Time units for video or animated graphic duration | ||
---|---|---|
Unit | QCode | Concept URI |
Edit Unit | timeunit:editUnit |
http://cv.iptc.org/newscodes/timeunit/editUnit |
Second | timeunit:seconds |
http://cv.iptc.org/newscodes/timeunit/seconds |
Millisecond | timeunit:milliseconds |
http://cv.iptc.org/newscodes/timeunit/milliseconds |
If a duration
attribute is present without a durationunit
attribute, then you must assume that the duration is expressed in seconds.
Video and animated graphic documents: icon renditions may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<!-- A visual item with two icons -->
<newsItem>
<contentMeta>
<icon href="http://example.com/img1.jpg"/>
<icon href="icons/img2.tiff"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: icon renditions may be provided in the content meta of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<!-- A video or animated graphic item with two icon renditions -->
<newsItem>
<contentMeta>
<icon href="http://example.com/img1.jpg"/>
<icon href="icons/img2.tiff"/>
</contentMeta>
</newsItem>
<!-- A video or animated graphic item with one icon rendition -->
<newsItem>
<contentMeta>
<icon href="ftp://example.com/img3.jpg"/>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
An icon is an image illustrating a video or an animated graphic (in NewsML-G2, an icon can also be associated with pictures or still graphics, but AFP documents do not use this feature). An icon is typically a keyframe of the visual content, but it can also be a logo or any other illustration.
Each video or animated graphic document, and each video or animated graphic item of a multimedia document may have at most one logical visual content as its icon. However, this content may be available in multiple renditions (e.g., low resolution, high resolution, JPEG format, TIFF format, etc.). Each rendition is described by an icon
element in the content metadata section the news item.
Links to the actual icon renditions are provided by href
attributes of icon
elements. The value of each href
attribute is an URI reference (while NewsML-G2 allows for IRI references, AFP systems only output URI references). See section "Accessing visual content through URI references" for additional directions on how to use these links.
In standard NewsML-G2 "Each [icon] rendition [in the content metadata section of a given news item] MUST represent the same visual content, differentiated only by physical properties such as content type and format". AFP icon renditions do not always abide by this rule: in addition to providing different technical representations of the same visual content, our icon renditions may also consist in crops or other alterations of the content provided by other icon renditions. |
For each icon rendition, some information might be provided by attributes on icon
elements. These attributes are described below.
To aid selecting icon renditions, the type of a rendition may be provided by a rendition
attribute in the icon
element describing the rendition, as in this example:
<!-- Two icon renditions of different types -->
<icon rendition="rnd:thumbnail" href="icons/img1.jpg"/>
<icon rendition="afprnd:squaredThumbnail" href="icons/img2.tiff"/>
The rendition
attribute provides a QCode whose possible values are taken from an IPTC controlled vocabulary and from AFP controlled vocabularies. Typical values are shown below.
Icon rendition types | ||
---|---|---|
QCode | Concept URI | Description |
rnd:thumbnail |
http://cv.iptc.org/newscodes/rendition/thumbnail |
A very small rendition of an image, giving only a general idea of its content |
afprnd:squaredThumbnail |
http://cv.afp.com/renditions/squaredThumbnail |
A small squared rendition of an image |
The media type of an icon rendition may be provided by a contenttype
attribute on the icon
element describing the rendition, as in this example:
<!-- Two description of icon renditions of different types -->
<icon contenttype="image/jpeg" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" href="icons/img2.tiff"/>
The value of the contenttype
attribute is a IANA MIME media type name [MediaTypes].
The contenttype
attribute may be complemented by a format
attribute to refine information about the data format of the icon rendition. For example:
<!-- Two descriptionss of icon renditions,
each one with a media type complemented by a format -->
<icon contenttype="image/jpeg" format="example:JPEG_Baseline" href="icons/img1.jpg"/>
<icon contenttype="image/tiff" format="example:NSK-TIFF" href="icons/img2.tiff"/>
The width and height of an icon rendition may be provided by width
and height
attributes (whose values are non-negative integers) on the icon
element describing the rendition. The units for these dimensions may be provided by widthunit
and heightunit
attributes. These attributes provide QCodes whose possible values are in a subset of the controlled vocabulary for dimension units defined by IPTC [IPTCDimUnits]. For example:
<icon width ="640" widthunit ="dimensionunit:pixels"
height="400" heightunit="dimensionunit:pixels" href="icons/img1.jpeg"/>
This fragment states that the visual content at icons/image1.tiff
is 640 pixels width and 400 pixels height (in this example, we suppose that dimensionunit
is a scheme alias for the controlled vocabulary defined by IPTC for dimension units).
The possible dimension units are a subset of the IPTC dimension units controlled vocabulary. They are provided in the table below, where the "Concept URI" column gives the URI to which the heightunit
and/or widthunit
attributes resolves. Currently, AFP always expresses icon dimensions in pixels.
Dimension units | ||
---|---|---|
Unit | QCode | Concept URI |
Pixels | dimensionunit:pixels |
http://cv.iptc.org/newscodes/dimensionunit/pixels |
If a width
and/or a height
attribute is present but the corresponding dimension unit attribute is missing, then you can assume that the width and/or height is expressed in pixels.
The size in bytes of an icon rendition may be provided by a size
attribute on the icon
element describing the rendition, as in this example:
<icon size="253476" href="icons/img1.jpeg"/>
In this example, the size
attribute asserts that the representation of the resource identified by icons/image1.tiff
weight 253476 bytes.
The value of the size
attribute is a non-negative integer.
Video and animated graphic documents: a script may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:script">
A rare glimpse of the art behind the label.
What Yves Saint Laurent earned in the fashion industry he spent on
masterpieces. At Christie’s auction house in London, a treasure trove of
paintings, sculpture, furniture and jewellery amassed by the fashion
icon and his lover and business partner Pierre Bergé -- over a 50 year
partnership.
SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department,
Christie’s Europe [English, 13 sec]:
"It's unprecedented - I mean we've never sold a collection in recent
memory of that sort of outstanding quality throughout and I think it's
going to be most welcome by collectors who don't have that often a
chance to acquire pieces of such quality"
Following the death of Yves Saint Laurent last year, Bergé chose to sell
the couple’s entire collection, which adorned their apartments in Paris.
For him, the sale is about finding some degree of closure:
SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house
[French, 16 sec]: "C’est le jour ou le dernier objet sera passé sous le
marteau d'un commissaire priseur que à mon sens – a mon sens - cette
collection pourra écrire le mot fin."
"Only on the day that the last piece goes under the hammer of an
auctioneer – in my view – will the last word of this collection be
written"
In spite of the global economic slowdown, Christie’s hopes the
collection will fetch around 400 million dollars when it goes up for
sale in Paris at the end of February.
A cubist-era Picasso – valued at 40 million dollars – and a rare
selection of Mondrians are among the highlights. But for Yves Saint
Laurent and Pierre Bergé, it was not about the price tags – more the
enjoyment of living amongst beautiful art.
SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas
[English, 19 sec]: "There was a great sense of everything being in the
right place - nothing dominating -and no trophies. I think it is a
collection that's formed by two incredibly intelligent people working
completely in concert with eachother - that's very unusual."
But it’s an unusual bond that is soon to be broken up amongst
collectors, dealers and museums – the end of a long reign for
the king of fashion.
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a script may be provided in the content metadata section of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A script for the content of this item -->
<description role="afpdescRole:script">
A rare glimpse of the art behind the label.
What Yves Saint Laurent earned in the fashion industry he spent on
masterpieces.At Christie’s auction house in London, a treasure trove of
paintings, sculpture, furniture and jewellery amassed by the fashion
icon and his lover and business partner Pierre Bergé -- over a 50 year
partnership.
SOUNDBITE 1: Thomas Seydoux, International Co-Head of Department,
Christie’s Europe [English, 13 sec]:
"It's unprecedented - I mean we've never sold a collection in recent
memory of that sort of outstanding quality throughout and I think it's
going to be most welcome by collectors who don't have that often a
chance to acquire pieces of such quality"
...
...
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- A script for the content of this item -->
<description role="afpdescRole:script">
Hundreds of art buyers and lovers from around the world came for the
biggest private collection ever up for auction.
SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
"I arrived two days ago to attend the sale."
SOUNDBITE 2: Vox pop (man) (English, 4 sec)
"I came especially for the exhibition. Going back to New York very
shortly."
...
...
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A script, if present, provides the transcript of voices that can be heard in the video. This may include voices recorded when the video was shot as well as audio commentary written and voiced by a journalist which is added to the images and recounts the events of the story. It may also contains indications of significant sounds (e.g., "the sound of an explosion"). These elements are provided in their order of occurrence in the video or animated graphic.
A script is provided by a description
element whose role
attribute, the QCode afpdescRole:script
, resolves to http://cv.afp.com/descriptionRoles/script
. It may appear at most once per item.
Note that in some documents, the content of a description
element whose role
attribute resolves to http://cv.afp.com/descriptionRoles/script
isn't a voice/sound transcript or isn't only a voice/sound transcript:
Shot lists have their dedicated slots in this XML format (see section "Shot list"), but in some documents they appear in the slots for scripts. For example, here is a description
element that contains both a script an a shot list (we show only partial content):
<description role="afpdescRole:script">
Script:
Hundreds of art buyers and lovers from around the world came for the biggest
private collection ever up for auction.
SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
"I arrived two days ago to attend the sale."
...
...
Shotlist: (shot Feb 23, 2009)
-wide of auctioneer
-painting on screen
-Berge arriving at auction
-SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
-SOUNDBITE 2: Vox pop (man) (English, 4 sec)
-close up of Matisse
...
...
</description>
Note that while NewsML-G2 allows for rich text by using some markup in the content of a script, AFP's systems only output simple textual content not interspersed with markup.
Video and animated graphic documents: a shot list may be provided in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:shotList">
-Member of Christie's staff walking in front of paintings
-Photographers
-Tilt of YSL poster
-VAR Christie's member of staff with metal art works
-VAR Theodore Gericault painting
-Thomas Seydoux, International Co-Head of Department, Christie’s Europe
-PAN of photo of YSL's flat in Paris
-SOUNDBITE 2: Pierre Bergé, co-founder Yves Saint Laurent Couture house
-Paintings on wall
-VAR Ferdinand Leger painting
-Picasso painting
-Woman looking at painting
-VAR Frans Hals portrait
-SOUNDBITE 3: Jonathan Rendell, Deputy Chairman, Christie’s Americas
-People walking through gallery
-Tilt to poster of YSL
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: a shot list may be provided in the content metadata section of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- A shot list for the content of this item -->
<description role="afpdescRole:shotList">
-Member of Christie's staff walking in front of paintings
-Photographers
-Tilt of YSL poster
...
...
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- A shot list for the content of this item -->
<description role="afpdescRole:shotList">
-wide of auctioneer
-painting on screen
-Berge arriving at auction
-SOUNDBITE 1: Vox pop (woman) (english, 3 sec)
-SOUNDBITE 2: Vox pop (man) (English, 4 sec)
-close up of Matisse
...
...
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
A shot list, if present, provides a concise description of each sequence. These elements are provided in their order of occurrence in the video or animated graphic.
A shot list is provided by a description
element whose role
attribute, the QCode afpdescRole:shotList
, resolves to http://cv.afp.com/descriptionRoles/shotList
. It may appear there at most once per item.
In some documents, the shot list isn't provided in this way but appear concatenated to the script (see section "Script" for an example).
The exact format of a shot list may not be the same for all kind of documents and may also vary according to local journalistic practices.
Note that while NewsML-G2 allows for rich text by using some markup in the content of a shot list, AFP's systems only output simple textual content not interspersed with markup.
Video and animated graphic documents: Speakers heard during audio or film recording may be described in the content metadata section of the news item.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<description role="afpdescRole:synthe">
-Thomas Seydoux (man), International Co-Head of Department,
Christie’s Europe
-Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
-Jonathan Rendell (man), Deputy Chairman, Christie’s Americas
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Multimedia documents: Speakers heard during audio or film recording may be described in the content metadata section of each news item conveying video or animated graphic content.
<newsMessage>
<itemSet>
<newsItem>
<contentMeta>
<!-- Speakers heard during recording the content of this item -->
<description role="afpdescRole:synthe">
-Thomas Seydoux (man), International Co-Head of Department,
Christie’s Europe
-Pierre Bergé (man), co-founder Yves Saint Laurent Couture house
-Jonathan Rendell (man), Deputy Chairman, Christie’s Americas
</description>
</contentMeta>
</newsItem>
<newsItem>
<contentMeta>
<!-- Speakers heard during recording the content of this item -->
<description role="afpdescRole:synthe">
-Vox pop woman
-Vox pop man
-Pierre Berge (man), Yves Saint Laurent's partner
</description>
</contentMeta>
</newsItem>
</itemSet>
</newsMessage>
Specific information may be provided about speakers heard during audio or film recording where an important value of the clip consists of what is said. In most clips these speakers appear in the images, but that may not always be the case.
This information may be provided by a description
element whose role
attribute, the QCode afpdescRole:synthe
, resolves to http://cv.afp.com/descriptionRoles/synthe
. It may appear at most once per item. This information is provided in the order of occurrence of speakers in the video or animated graphic.
This information typically includes speakers' name and function. It can be used, for example, to add captions accompanying speakers' appearances in the video.
Note that while NewsML-G2 allows for rich text by using some markup in description
elements, AFP's systems only output simple textual content not interspersed with markup.
Some data is specific to multimedia documents. This section details these data elements.
Multimedia documents: the number of non-main items broken down by item natures may be provided in the item metadata section of the main news item.
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
<afp:extension>
<afp:stats>
<afp:totalComponentsOfType qcode="ninat:graphic" total="1" />
<afp:totalComponentsOfType qcode="ninat:picture" total="3" />
</afp:stats>
</afp:extension>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
As shown above each totalComponentsOfType
element provides the number of non-main items of a given nature present in the document. The qcode
attribute specifies the nature as described in the following table:
Natures of multimedia non-main items | ||
---|---|---|
Type | QCode | Concept URI |
Picture | ninat:picture |
http://cv.iptc.org/newscodes/ninature/picture |
Video | ninat:video |
http://cv.iptc.org/newscodes/ninature/video |
Still graphic | ninat:graphic |
http://cv.iptc.org/newscodes/ninature/graphic |
Animated graphic | ninat:animated |
http://cv.iptc.org/newscodes/ninature/animated |
The total
attribute provides the number of items of the given nature, as a strictly positive integer. If the stats
element is present, the absence of a totalComponentsOfType
element for a given nature means that no non-main item of that nature is present in the document.
The totalComponentsOfType
elements appears inside a stats
element inside an extension
element in the item metadata section of the main news item. Note that the totalComponentsOfType
, stats
and extension
elements are not standard NewsML-G2 vocabulary but part of an AFP's specific extension. They are defined in an XML namespace whose name is http://www.afp.com/format/internal/
.
Therefore, here is how to interpret the example given at the beginning of this section:
<afp:totalComponentsOfType qcode="ninat:graphic" total="1" />
means that there is one still graphic item in the document.<afp:totalComponentsOfType qcode="ninat:picture" total="3" />
means that there is three picture items in the document.totalComponentsOfType
element for other item natures means that there is no animated graphic and video item in the document.The extension
and stats
elements are optional (i.e., they may or may not present). When they are present they appear at most once per document.
Multimedia documents: the multimedia content is provided using the XML syntax of HTML in the content set of the main news item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
</itemMeta>
<contentSet>
<inlineXML contenttype="application/xhtml+xml">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>
YSL-Bergé collection sets new world record at auction
for a private collection
</title>
</head>
<body>
<p>
The Yves Saint Laurent and Pierre Bergé collection sets
new world record at auction for a private collection.
Hundreds of art treasures amassed by late fashion designer
Yves Saint Laurent and his companion Pierre Berge over half
a century are being auctioned.
</p>
<p>
<!-- Embedded content from a picture item -->
<span class="g2item g2picture">
<a style="display: none" href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"></a>
<img src="image1.jpeg" style="float: left;"
generator-unable-to-provide-required-alt="" height="163" width="245" />
</span>
</p>
<p>
Bids hit 206 million euros (261 million dollars) on February
23, 2009 making it the biggest private collection ever
auctioned with two days of sales still left to run.
</p>
<p>
<!-- Embedded content from a video item -->
<span class="g2item g2video">
<a style="display: none" href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"></a>
<video style="float: right;" controls="controls" height="138" width="245"
poster="keyframe1.jpeg">
<source src="video1.mp4" type="video/mp4" />
</video>
</span>
</p>
<p>
<!-- An hypertext link to an external resource -->
The <a class="ignorableTextFalse" href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
wikipedia page about Yves Saint-Laurent</a> claims that ...
</p>
</body>
</html>
</inlineXML>
</contentSet>
</newsItem>
<newsItem guid="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2">
...
</newsItem>
<newsItem guid="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052">
...
</newsItem>
</itemSet>
</newsMessage>
The multimedia content expressed using the XML syntax of HTML is the main journalistic content of the document. It is provided by an inlineXML
element. A contentType
attribute with a value of application/xhtml+xml
explicitly denotes the usage of the XML syntax of HTML.
The multimedia content contains the main textual content intermingled with links and audiovisual content. As shown in this figure, some parts of this content (e.g., pictures, videos, etc.) may be described by their own news items. These parts are referred to as "components". These news items describing them are themselves part of the NewsML-G2 document.
You can see in the example above that we use a microformat [Microformat] to denote a component and the reference to the news item that describes it. This allows to provide displayable information (e.g., an img
tag) along with semantic markup (e.g., the reference to the news item) which can be machine-processed by your system.
This microformat consists in a span
elements with a class
attribute that contains "g2item
". In addition, we provide another class name denoting the type of the referenced item (e.g., "g2picture
", "g2video
", etc.).
The first child element of such a span is always the reference to the news item that describe the component. It is represented as an a
tag whose href
attribute provides the GUID of the news item. This element is marked as non displayable as it is not meant to be directly displayed. Following this element, additional HTML markup defines embedded content for displaying a default rendition of this component. For example, a document may contains an img
element displaying a picture.
This microformat is called the g2item microformat. Another microformat called the g2document microformat is used to represent links to other NewsML-G2 documents. In is described in its dedicated section below.
The following sections detail how various types of components and links are represented.
The class name "g2item
" signals that we use the g2item microformat: the span represents a component along with a reference to the associated news item. The class name "g2picture
" denotes that the referenced news item provides picture content. Inside the span, the first element provides the guid of that news item. The second element defines embedded content for displaying a default rendition of the picture, using a standard HTML img
tag. For example:
<span class="g2item g2picture">
<a style="display: none" href="urn:newsml:afp.com:20100101:7a0846c9-e341-45dc-a3a2"></a>
<img src="image1.jpeg" style="float: left;"
generator-unable-to-provide-required-alt="" height="163" width="245" />
</span>
Embedded still graphic is defined like embedded picture except that in the span
element we use the class name g2graphic
instead of g2picture
. For example:
<span class="g2item g2graphic">
<a style="display: none" href="urn:newsml:afp.com:20100101:7a123456-a542-76fg-ab6a"></a>
<img src="image1.jpeg" style="float: left;"
generator-unable-to-provide-required-alt="" height="163" width="245"/>
</span>
For embedded video we also use the use g2item microformat. The class name g2video
denotes that the referenced news item provides video content. Inside the span, the first element provides the guid of that news item. The embedded video is then defined using a standard HTML video
tag. An illustration image may be provided by poster
attribute, and additional attributes such as autoplay, loop, etc. may be used as well. For example:
<span class="g2item g2video">
<a style="display: none" href="urn:newsml:afp.com:20100101:7633a15b-a990-4db6-9052"></a>
<video style="float: right;" controls="controls" height="138" width="245"
poster="keyframe1.jpeg">
<source src="video1.mp4" type="video/mp4" />
</video>
</span>
The HTML can contain hypertext links to other resources such as Web pages. They may be provided by a
elements. For example here is a link to a wikipedia page:
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)" >wikipedia page about Yves Saint-Laurent</a>
The class
attribute, if present, may be used to specify either the class name "ignorableTextFalse
" or "ignorableTextTrue
". These class names are meant to assist you if you need to remove hypertext links from the HTML content (this is a common need for some of our clients).
ignorableTextFalse
means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the hypertext links :
Pierre Bergé quoted the
<a class="ignorableTextFalse"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">wikipedia page about Yves Saint-Laurent</a>
to illustrate...
After removing hypertext links the fragment should be:
Pierre Bergé quoted the wikipedia page about Yves Saint-Laurent to illustrate...
ignorableTextTrue
means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the hypertext links :
Some text before.
<a class="ignorableTextTrue"
href="http://en.wikipedia.org/wiki/Yves_Saint_Laurent_(designer)">
This Web page provides additional information.
</a>
Some text after.
After removing hypertext links the fragment should be:
Some text before. Some text after.
The HTML can contain links to other NewsML-G2 documents managed by AFP. Such links are associated with a part of the textual content. We represent these links using the g2document microformat. It consists in a span
element with a class
attribute that contains "g2document
". In addition, we provide another class name denoting the type of the referenced document: "g2picture
", "g2video
", etc. Finally, we may provide a class name that provides a hint on how a link could be removed gracefully. For example:
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
some text
</span>
The content of the span
element is organized as follow:
a
tag whose href
attribute provides the GUID of the NewsML-G2 document. Note that while it may look like a dereferencable URI, it actually isn't. This element is marked as non displayable as it is not meant to be directly displayed.a
tag may provide the dereferencable URI reference of the NewsML-G2 document. Typically, this element will be present if the AFP delivery system determines that it has delivered the corresponding document to you and know where to locate it in your delivery space.The following table lists the class names used to specify the type of a referenced NewsML-G2 document. See the overview section for a presentation of the various document types.
Types of referenced NewsML-G2 document | ||
---|---|---|
Class name | Type | |
g2text | Text | |
g2multimedia | Multimedia | |
g2picture | Picture | |
g2graphic | Still graphic | |
g2animated | Animated graphic | |
g2video | Video | |
g2liveReport | Live report index | |
g2interactive | Interactive graphic |
The class
attribute may also be used to specify "ignorableTextFalse
" or "ignorableTextTrue
". These class names are meant to assist you if you need to remove links from the HTML content (this is a common need for some of our clients).
ignorableTextFalse
means that if you process the HTML in order to remove links then not removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Pierre Bergé quoted
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
a recent AFP news story
</span>
to illustrate...
After removing links the fragment should be:
Pierre Bergé quoted a recent AFP news story to illustrate...
ignorableTextTrue
means that if you process the HTML in order to remove links then also removing the text associated with this link will produce a better result.
For example, suppose that the HTML contains the following fragment before removing the links :
Some text before.
<span class="g2document g2text ignorableTextFalse">
<a style="display: none" href="http://doc.afp.com/7W37U"></a>
<a style="display: none" href="otherDocument.xml"></a>
This AFP news story provides additional information.
</span>
Some text after.
After removing links the fragment should be:
Some text before. Some text after.
Live report posts are represented by multimedia documents. They can contain additional dedicated metadata, as described in this section.
Live report posts: the indication that a post is an intertitle is provided in the item metadata section of the main news item.
<newsMessage>
<itemSet>
<newsItem>
<itemMeta>
<!-- This link element tells that this news item is the main item of the multimedia document -->
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
<!-- This link element tells that this multimedia document represents an intertitle in a live report -->
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/liveReportIntertitle"/>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
While most posts carry a news bit about the ongoing event being reported, some differ as they represent intertitles. An intertitle typically provides some text describing a phase of the ongoing event, or another regroupment of a subset of posts. An intertitle is identified by the presence of a specific element in the item metadata section of its main item: a link
element whose rel
attribute convey the concept URI http://cv.iptc.org/newscodes/conceptrelation/isA
(using the QCode crel:isA
) and whose href
attribute is the URI http://cv.afp.com/itemnatures/liveReportIntertitle
.
>Live report posts: the timestamp in live report is provided in the item metadata section of the main news item.
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<newsItem>
<itemMeta>
<link rel="crel:isa" href="http://cv.afp.com/itemnatures/mmdMainComp"/>
<afp:extension>
<afp:timestampInLiveReport>
<afp:date>2016-07-09T15:30:33.928Z</afp:date>
<afp:label>15h30</afp:label>
</afp:timestampInLiveReport>
</afp:extension>
</itemMeta>
</newsItem>
</itemSet>
</newsMessage>
The timestamp in live report is provided for multimedia documents that represent posts in live reports. Each post is associated with a timestamp. This timestamp is provided by a timestampInLiveReport
element in a extension
element inside the item metadata section. It is made of :
date
element.These extension
, timestampInLiveReport
, date
and label
elements are in the XML namespace http://www.afp.com/format/internal/
.
Some data is specific to live report indexes. This section details these data elements.
Live report indexes: a lead of the live report may be provided in the content metadata section of the package item.
<newsMessage>
<itemSet>
<packageItem>
<contentMeta>
<description role="afpdescRole:lead">
<html:html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml">
<head />
<body>
<p>Live inside Christie's auction of Yves Saint-Laurent/bergé collection.</p>
<p>Auction sparks huge interest. Follow our report and analysis live.</p>
</body>
</html:html>
</description>
</contentMeta>
</packageItem>
</itemSet>
</newsMessage>
A "lead" for the live report may be provided by a description
element whose a role attribute, the QCode afpdescRole:lead
, resolves to http://cv.afp.com/descriptionRoles/lead
. Inside this element the lead is provided using the XML syntax of HTML in an html
element in namespace http://www.w3.org/1999/xhtml
.
When present, the lead contains a short description (typically around one hundred words) of what the live report is about.
Live report indexes: the list of posts of the live report is provided in the groupSet
section of the package item.
<newsMessage xmlns="http://iptc.org/std/nar/2006-10-01/" xmlns:afp="http://www.afp.com/format/internal/">
<itemSet>
<packageItem>
<groupSet>
<group role="afpgroup:elements">
<!-- An example of a live report index with three posts.
As a story develops, real live reports can include tens or hundred of posts. -->
<itemRef href="d-oc1ku.xml">
<afp:iteminfo>
<headline>Auction opens</headline>
</afp:iteminfo>
</itemRef>
<itemRef href="d-oc02w.xml">
<afp:iteminfo>
<headline>Christie's shows ten most intriguing pieces</headline>
</afp:iteminfo>
</itemRef>
<itemRef href="d-ob2p7.xml">
<afp:iteminfo>
<headline>Press conference scheduled at 7 PM</headline>
</afp:iteminfo>
</itemRef>
</group>
</groupSet>
</packageItem>
</itemSet>
</newsMessage>
The list of posts is provided as a list of links to the NewsML-G2 documents that represent individual posts. These links are provided inside the group set of the package item, in a group
element whose role
attribute, the QCode afpgroup:elements
, resolves to http://cv.afp.com/grouproles/elements
. Each link is provided by an itemRef
element, through an href
attribute (see the NewsML-G2 documentation [G2Doc] for more information about the itemRef
construct).
Inside each itemRef
, an itemInfo
element in the XML namespace http://www.afp.com/format/internal/
may provide a title for the post in an headline
element.
The list is chronologically ordered: the first itemRef
links to the most recent post, the second itemRef
links to the second most recent, etc.
In a document, a number of elements provide links to actual visual content in formats such as JPEG, MPEG-4, etc. Some of these elements are defined by NewsML-G2 while others are defined by HTML, as AFP text and multimedia documents can contain HTML (in XML syntax) embedded right into NewsML-G2. For example, such links can be provided by:
href
attributes in remoteContent
and icon
elements.src
attributes in img
elements, video
elements, etc.poster
attributes in video
elements.A link of this type is an URI reference as defined by [RFC3986]. This means it is either an URI or a relative-ref (colloquially referred as "relative URI").
At some point when dealing with a NewsML-G2 document, you'll typically want to retrieve the actual visual content, in order to process or display it.
If the link is a (non relative) URI per [RFC3986], you can directly dereference it, using standard software components, to retrieve the actual visual content. Typically, the scheme(s) used for such URI depend(s) on the specific delivery architecture established between you and AFP. Examples of commonly used schemes are: http, ftp and cid.
If the link is a relative-ref, then you need to resolve it to its target URI. You can then dereference the target URI to retrieve the actual visual content.
Note that with most standard libraries providing URI reference resolution, resolving a (non-relative) URI is the identity operation. That way, you don't have to determine whether you have been handed an (non-relative) URI or a relative-ref: you can just resolve the URI reference and then dereference it to retrieve the actual visual content.
Section 5 of [RFC3986] defines the process of resolving an URI reference. To carry on this process, you need the URI reference itself (as stated earlier, it is provided in the document, for example in an href
attribute, src
attribute, etc.) and a base URI. Typically the base URI is the URI that allows retrieving the NewsML-G2 document.
For example, if AFP delivers you a package that contains both an AFP NewsML-G2 document and data files for the associated visual content, the base URI is the URI that allows accessing the NewsML-G2 document after delivery. Suppose AFP delivers content in your file system in the directory "/deliverySpace/internet-journal/topnews/", producing the following file structure :
Sample delivery structure
In this context, the base URI is the URI that allows accessing the NewsML-G2 document after delivery. If your NewsML-G2 processor accesses the NewsML-G2 document at file:///deliverySpace/internet-journal/topnews/doc.afp.com-9719Z-2.xml
, then this is the base URI. The URI references linking to the visual content can be resolved relatively to this base URI. For example, the URI reference 5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg
would resolve to file:///deliverySpace/internet-journal/topnews/5b9c11cbf6871cb93696bebab8bdbc2c16afc44b-highDef.jpg
, which can then be dereferenced to access that particular visual content.
Several libraries provide URI reference resolution. For instance, in Java, one could use the resolve()
method of the java.net.URI
class.
August 2021
The section Role in workflow has been enhanced to show that a flash can be followed by an urgent but not by an alert.
The section on caption has been thoroughly rewritten to explain that captions may be provided in two parts, the content description and the context description.
The new concept of renditions dedicated to cancelled documents has been documented in the section on publishing status.
The section on subjects has been completed to explain that some subjects are identified by an uri
attribute. The section on locations that are subject matter of the document has been completed to show how a location can be specified using a geo URI.
In the section on locations from which the content originates, the entry about graphics has been corrected.
The section on mandatory processing has been enhanced.
The section on catchlines now states that a multimedia documents may provide a catchline identified by the role http://cv.afp.com/headlineroles/introduction
.
The section on subtitles now states that subtitles are only provided for text and multimedia documents and that usually there is at most two subtitles.
The XML syntax for HTML was formerly referred to as "XHTML". As the latest versions of the HTML living standard no longer use that term, this document no longer use that term either.
This version also includes a number of editorial improvements.
July 2019
A section about mandatory processing has been added.
The sections about visual content rendition types and icon renditions types have been thoroughly updated.
A section about the copyright notice metadata has been added.
The section on content creation date now states that for photo combos, the content creation date we provide is the date of creation of the combo (instead of a shooting date).
A convergence effort between the metadata models of text and multimedia documents is underway in our production system. As a result the Related production and Role in workflow metadata may now be provided on multimedia documents. The documentation has been updated to reflect this change.
The section on publishing status, including information about cancelling documents, has been thoroughly rewritten to provide additional and more precise information.
Update about content warnings: our editorial system now makes use of the newly standardized content warning for "suffering". This documentation has been updated to reflect it.
The section about Visual Dimensions now states that the "millimeters" dimension unit may be used in AFP newsML-G2 documents.
"Related interactive graphic" has been added to the section about related production.
This version also includes a number of editorial improvements.
March 2018
Major update for multimedia documents, including initial documentation of our HTML microformats.
The documentation now states that a location of origin of content can be a "point of interest", in addition to already documented types (city, country area, country). See section Locations From Which The Content Originates.
The documentation provides a more accurate description of the "synthe" metadata, now stating that it concerns speakers heard during audio or film recording where an important value of the clip consists of what is said. In previous versions it was described as applying only to visible speakers. See section Speakers heard during audio or film recording (aka synthe).
Tables listing the main languages used in AFP production and their corresponding BCP 47 codes are now provided. See sections Language of the content and Language of metadata.
Various editorial improvements.
August 2016
The documentation has been updated thoroughly to allow processing AFP NewsML-G2 documents without resolving QCodes.
The documentation now states that along with event identifiers, the names of the events may be provided.
The documentation now states that posts in live report indexes are ordered chronologically (therefore it is no longer your responsibility to sort them).
The description of the "Timestamp in live report" metadata has been improved to include documentation for the label
element.
The documentation of live reports now covers the notion of intertitle.
A number of improvements and clarifications have been made.
July 2016
The documentation for live reports has been added.
This document is now entirely self contained in one file, which makes it easier to distribute and use.
An important correction has been made: in previous versions of this documentation the concept URI for the "forbyline" role (cf. section on creators and contributors) was incorrectly specified as http://cv.afp.com/creatorroles/forbyline
. This has been corrected; the correct concept URI is: http://cv.afp.com/contributorroles/forbyline
.
A section on mentions of related production has been added.
An example has been added to the section on textual content of text document showing that the content can contain hypertext links.
A number of improvements and clarifications have been made.
February 2016
This documentation has been updated thoroughly for text documents.
February 2014
Documentation updated thoroughly in preparation of public delivery of NewsML-G2 documents.
January 2012
Initial version.
[G2Doc] | "NewsML-G2 Documentation". IPTC. Available from https://iptc.org/standards/newsml-g2/using-newsml-g2/ |
[MediaTypes] | MIME Media Types. Available at http://www.iana.org/assignments/media-types/index.html |
[IPTCCPNatures] | The IPTC controlled vocabulary for basic natures of concepts. Available at http://cv.iptc.org/newscodes/cpnature/ |
[IPTCDimUnits] | The IPTC controlled vocabulary for dimension units. Available at http://cv.iptc.org/newscodes/dimensionunit/ |
[IPTCGenres] | The IPTC controlled vocabulary for genres. Available at http://cv.iptc.org/newscodes/genre/ |
[IPTCLocTypes] | The IPTC controlled vocabulary for location types. Available at http://cv.iptc.org/newscodes/location/ |
[IPTCMediaTopics] | The IPTC controlled vocabulary for media topics. Available at http://cv.iptc.org/newscodes/mediatopic/ |
[IPTCNProviders] | The IPTC controlled vocabulary for news providers. Available at http://cv.iptc.org/newscodes/newsprovider/ |
[IPTCTimeUnits] | The IPTC controlled vocabulary for time units. Available at http://cv.iptc.org/newscodes/timeunit/ |
[IPTCCWarn] | The IPTC controlled vocabulary for content warnings. Available at http://cv.iptc.org/newscodes/contentwarning/ |
[ISO3166] | ISO 3166 Maintenance Agency. Available at http://www.iso.org/iso/country_codes.htm |
[HTTPURI] | "RFC 2616, section 3.2: Uniform Resource Identifiers". R. Fielding & al. June 1999. Available at http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.2 |
[RFC3085bis] | "URN Namespace for news-related resources". M. Steidl and J. Lorenzen. July 2009. Draft available at http://tools.ietf.org/html/draft-steidl-newsml-urn-rfc3085bis-00 |
[RFC3986] | "Uniform Resource Identifier (URI): Generic Syntax". T. Berners-Lee, R. Fielding and L. Masinter. January 2005. Available at http://tools.ietf.org/html/rfc3986 |
[RFC3987] | "Internationalized Resource Identifiers (IRIs)". M. Duerst and M. Suignard. January 2005. Available at http://www.ietf.org/rfc/rfc3987 |
[RFC5646] | "Tags for Identifying Languages". A. Phillips and M. Davis. September 2009. Available at http://tools.ietf.org/html/rfc5646 |
[RFC5870] | "A Uniform Resource Identifier for Geographic Locations ('geo' URI)". A. Mayrhofer and C. Spanring. June 2010. Available at http://tools.ietf.org/html/rfc5870 |
[TagCloud] | Wikipedia article on tag cloud. Available at http://en.wikipedia.org/wiki/Tag_Cloud |
[XMLSchemaDataTypes] | XML Schema Part 2: Datatypes. Available at http://www.w3.org/TR/xmlschema-2/ |
[XMLSpec] | "Extensible Markup Language (XML) 1.0". Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, Eve Maler, François Yergeau. Available at http://www.w3.org/TR/xml/ |
[Microformat] | Wikipedia article on microformats. Available at http://en.wikipedia.org/wiki/Microformat |
[HTMPSpec] | HTML Living Standard. Available at https://html.spec.whatwg.org |
Prepared and written by Philippe Mougin
Copyright © 2012-2021 AFP