NAME
WWW::Wikipedia::LangTitles - get interwiki links from Wikipedia.
SYNOPSIS
use utf8;
use WWW::Wikipedia::LangTitles 'get_wiki_titles';
my $title = 'Three-phase electric power';
my $links = get_wiki_titles ($title);
print "$title is '$links->{de}' in German.\n";
my $film = '東京物語';
my $flinks = get_wiki_titles ($film, lang => 'ja');
print "映画「$film」はイタリア語で「$flinks->{it}」と名付けた。\n";
produces output
Three-phase electric power is 'Dreiphasenwechselstrom' in German.
映画「東京物語」はイタリア語で「Viaggio a Tokyo」と名付けた。
(This example is included as synopsis.pl in the distribution.)
VERSION
This documents version 0.04 of WWW::Wikipedia::LangTitles corresponding to git commit cd5d0156c401472bc424421159fca7d3c0f769fe released on Thu Jul 20 13:15:53 2017 +0900.
DESCRIPTION
This module retrieves the Wikipedia interwiki link titles from the web site wikidata.org. It can be used, for example, to translate a term in English into other languages, or to get near equivalents.
FUNCTIONS
get_wiki_titles
my $ref = get_wiki_titles ('Helium');
Given a word or phrase as an argument, which is the title of a Wikipedia article, the return value is a hash reference containing keys which are language codes, and values which are the names of the equivalent Wikipedia article in other languages. For example, in the above case of Helium, $ref->{th}
will be equal to ฮีเลียม, the Thai title of the Wikipedia article on helium.
The language of the original page can be specified like this:
use utf8;
my $from_th = get_wiki_titles ('ฮีเลียม', lang => 'th');
The URL is encoded using "uri_escape_utf8" in URI::Escape, so use character, not byte, strings (use "use utf8;" etc.)
As of version 0.04, get_wiki_titles deletes the non-encyclopedia sites like Wikiquote and Wikiversity from the list of returned values.
make_wiki_url
my $url = make_wiki_url ('helium');
Make a URL for the Wikidata page. You will then need to retrieve the page and parse the JSON yourself. Use a second argument to specify the language of the page:
use utf8;
use WWW::Wikipedia::LangTitles 'make_wiki_url';
print make_wiki_url ('ฮีเลียม', 'th'), "\n";
produces output
https://www.wikidata.org/w/api.php?action=wbgetentities&sites=thwiki&titles=%E0%B8%AE%E0%B8%B5%E0%B9%80%E0%B8%A5%E0%B8%B5%E0%B8%A2%E0%B8%A1&props=sitelinks/urls|datatype&format=json
(This example is included as thai-url.pl in the distribution.)
If no language is specified, the default is en
for English.
This method was added in version 0.02 of the module.
SEE ALSO
- Locale::Codes
-
This module enables one to convert the language key names given by this module into the English-language names of the languages.
use utf8; use FindBin '$Bin'; use WWW::Wikipedia::LangTitles 'get_wiki_titles'; use Locale::Codes::Language; my $article = 'King Kong'; my $titles = get_wiki_titles ($article); for my $lang (keys %$titles) { my $l2c = code2language ($lang); if (! $l2c) { $l2c = $lang; } my $name = $titles->{$lang}; if ($name ne $article) { print "$name in $l2c.\n"; } }
produces output
king.kong in jbo. קינג קונג in Hebrew. Кинг Конг in Bulgarian. キングコング in Japanese. كينغ كونغ in Arabic. Кінг-Конг in Ukrainian. King Kong (hahmo) in Finnish. 金剛 (怪獸) in Chinese. Քինգ Քոնգ in Armenian. คิงคอง in Thai. کینگ کونگ in Persian. Кинг-Конг in Russian. 킹콩 in Korean. კინგ კონგი in Georgian.
(This example is included as locale-codes.pl in the distribution.)
DEPENDENCIES
- Carp
-
Carp is used to report errors
- LWP::UserAgent
-
LWP::UserAgent is used to retrieve the data from Wikidata.
- JSON::Parse
-
JSON::Parse is used to parse the JSON data from Wikidata.
- URI::Escape
-
URI::Escape is used to make the URLs for Wikidata from the input titles.
EXPORTS
Nothing is exported by default. The export tag ':all' exports all the functions of the module.
use WWW::Wikipedia::LangTitles ':all';
TESTING
The default tests of the module do not attempt to connect to the internet. To test using an internet connection, run xt/scrape.t like this:
prove -I lib xt/scrape.t
from the top directory of the distribution.
HISTORY
This module was a collection of small scripts I had been using to scrape multilingual article names related to physics from Wikipedia. I made the scripts into a CPAN module because I thought it could be useful to other people. Specifically, I used my scripts to add some Japanese element names to Chemistry::Elements, and I thought this method might be useful for someone else.
Version 0.02 added the "make_wiki_url" for people who want to retrieve and parse the output themselves.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2016-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.