This is an interesting idea. When Wikimedia finalizes an incremental backup solution, it may be possible. They'll release a dump with incremental additions / updates / deletions. You would then have XOWA accept the additions / updates, but ignore all the deletions.
It would place more responsibility on the user to maintain their copy of the dump though.
That's true. The burden then shifts to the user. But in a way, that's also good because then you can choose which snapshot to follow.
It's a bit like maintaining your copy of an OS. You can stick to the "stable" branch or, if you're feeling adventurous, you can switch to "release". If you're really into the bleeding edge, you can go with the "nightly" build.
All-in-all, I really like this.
One concern I have is the possible increased bandwidth load for WP. Maybe you can include a small icon or notification to support it by donations. Couldn't hurt to have one there for yourself as well.
Source control is an interesting analogy. In the same vein, when a user syncs their version with the main branch, there will be hundreds of thousands of changes to review. It'll be pretty harrowing for anyone to figure out what to keep / reject. Just something to consider.
Anyway, thanks for the food for thought as well as your suggestion. I added donation links for archive.org and wikipedia tonight.
Thanks for giving it a try. Kiwix is definitely more polished in UI, especially as it has been around for 5+ years. I'd like to think that though XOWA isn't as friendly UI wise, it offers a lot more power / options.
Regarding images. There is some assembly acquired, but I tried to make the instructions as simple as possible. If you look at http://xowa.sourceforge.net/setup_simplewiki.html, then there should be two steps:
* Download the .7z file from from archive.org: http://archive.org/details/Xowa_simplewiki_2013-10-30_images...
* Unzip the .7z file to your XOWA directory. If you're on Windows and have C:\xowa as your folder, you should get a file called C:\xowa\file\simple.wikipedia.org\fsdb.main\fsdb.abc.sqlite3 as well as many others
enwiki is a little more difficult, but only in that it requires downloading more files.
Let me know if you run into other issues. I'm going off to work now, but I'll check again later.
EDIT: I forgot to add that if you set up ImageMagick and Inkscape (installation instructions are on XOWA's Main_Page), you can download images dynamically for each article (i.e.: you don't need to download the entire image dump first)
Thanks your your reply. I did see the things you mention regarding images, but the gap is that I'm exporting a private mediawiki, not one of the well known wikis that you have added explicit support for.
I tried tar'ing up my images directory from the server, and unpacking them in a few locations on the filesystem that looked like likely places, but that didn't work. The filesystem layout was kinda confusing with the "user" and "wiki" separation.
How would one prepare a similar image database for an unsupported wiki? I expect this is a custom thing you prepared as opposed to the xml text dump which is a standard mediawiki dump format.
As to the imagemagik part, it doesn't work for an unsupported wiki. Also, it would be impractical for me to manually crawl my whole site triggering downloads of images, and even if I did that, it is unclear how to package and deploy it. The deployment needs to be completely offline becuase there is no Internet at the prison.
Overall, setting up one of the well known wikis is probably pretty smooth, but the private wiki requires a lot of technical knowledge about implementation details that make this tool impractical for unskilled users. Right now, kiwix deployment it is close to ideal. I just need to instruct the unskilled user to replace the ZIM file.
There is one small deficiency in the kiwix deployment in that the automatic index files are user specific unless prepared in advance and recorded in the libary.xml file, so in practice I had to prepare a script to make sure the index and library were right. The actual deployment is "copy zim files to this dir, then double-click on this script"
Hey. I just happened to check this thread and saw your response.
To answer your question, yes: the image databases were prepared with expectations of a standard Wikimedia wiki. These wikis have a standard file layout of wikipedia/wikidomain/thumb/hash0/hash01/name_of_file/thumbnail_file/; EX: wikipedia/commons/thumb/9/97/The_Earth_seen_from_Apollo_17.jpg/270px-The_Earth_seen_from_Apollo_17.jpg.
If you're using a MediaWiki installation, your files should be laid out similarly. You can change the XOWA config file to explicitly specify this WMF layout. XOWA allows the user to work directly with the WMF tarballs, so this should work for you as well. You can look at this thread for another user's attempts: https://sourceforge.net/p/xowa/discussion/general_archived/t... If you have questions, feel free to ask / post.
The other alternative is that XOWA should have the ability to read from a non-Wikimedia directory. Another user asked for this for his own private wiki: https://sourceforge.net/p/xowa/tickets/159/. In this scenario, you'd have all your files in some root directory (C:\images) and XOWA would index the directory and look-up the file by filename. You would probably need imageMagick and inkscape installed though.
Regarding your other point: I will probably centralize all the directories, instead of spreading them out between /wiki/, /file/, /user/. I had a reason for this layout, but it's causing confusion among a few users. You could always zip the files with relative paths, and instruct the users to unzip the zip. For example, the XOWA wikiquote package is one zip file: https://archive.org/details/Xowa_enwikiquote_2013-11-19_comp.... If you unzip it in the /xowa/ dir, it will automatically put all files into relevant folders
In the end, if you have a routine set up for kiwix, you're probably best sticking with it. Keep in mind that XOWA does offer some other nice features that you may / may not need. (editable wiki pages; Wikimedia Lua code). It also offers a lot customization. For example, one of the users added Mathjax to XOWA on his own. (he then proceeded to add a lot more: sorting / collapsing, wikidata skin, redlinks, etc.)
Let me know if you're interested, and I'll see what I can do to help. Otherwise, thanks for the use case scenario. It's definitely something I'll consider supporting in the future!
That's pretty impressive. I never had the patience to sit through a full MediaWiki import for en.wikipedia.org.
Just to be clear, XOWA isn't an installer for MediaWiki, but it's own app. This allows it to avoid the dependency on the entire MediaWiki tool-chain (apache, php, mysql, MediaWiki). Unfortunately, this means that XOWA has to reproduce the same logic, which is quite a challenge...
It is indeed a challenge. The mediawiki syntax is the weirdest mess I have ever had to parse. There is no spec, real world usage deviates significantly from the help docs, and it's a Turing complete language with heaps of backwards compatibility hacks. So if you have something reasonably complete and correct than kudos to you!
Thanks. The syntax was challenging, especially all the template syntax ("{{my_template|{{{argument1|defaultvalue|{{nested_template}}}}}}}"). Fortunately, the new lua module should eventually replace the template syntax, which should make it easier for future parsers.
Yes, this would be the ideal approach, but it can become quite complicated (b/c the tool-chain needs to be installed for different machines). In addition, the official.xml importer (importDump.php) is not really up to the task (slow / sometimes buggy).
If you're interested in going this route, you can look at http://www.nongnu.org/wp-mirror/. This should build a local MediaWiki instance with one click. Keep in mind that it's a bit slow: it takes two days to build simple.wikipedia.org with images. In contrast, XOWA sets this up in about 30 min
However it suffers the number one torrent issue: they do not tolerate change. This means that
- When an article changes, you need to generate a new torrent
- When a new torrent describes the archive, it needs to be downloaded from scratch by all peers, so that the maximum number of peers are available for a newcomer.
I hope you'll understand that this is not the official way to distribute archives...
Kiwix's Android app and the full-text search are both great features.
However, I'll point out that Kiwix has not updated English Wikipedia since January 2012. Also, XOWA works directly with the Wikimedia dumps (http://dumps.wikimedia.org/backup-index.html) so it's (a) always up to date and (b) can work on any wiki (Kiwix needs to release the zim file first)
Also, XOWA can run from an external SD card (including FAT32 formatted ones)
Another Android option is Fastwiki [1]. No images, but provides a conversion tool to convert native Wikimedia dumps. Also works with older Android versions.
Unfortunately, having your own copy of Wikipedia could also be used to enable censorship. For example, a fundamentalist school could have their own version of Wikipedia from which they've purged all articles about evolution, etc. Then they could configure their firewall to block the real Wikipedia.
Agreed. However, I think it would be less work for them to block access through firewall policy, than to remove them from XOWA.
By and large, for most private individuals, an offline app would allow them to evade censorship. I'd hope that this benefit outweighs the risk of the other's abuse.
A firewall would not hide the fact that censoring takes place. You would have to rewrite content to do that. That might be easier in batch, especially if you are going to do NLP to make cut up sentences grammatical.
Ahh.... That's pretty devious. I was thinking of blocking the entire article, not rewriting content. Still not worth the work IMHO, but who knows what censorship servants would do.
You can try the low-space import. There are instructions in XOWA at home/wiki/Help:Options/Import. It takes longer (8-10 hours) but only needs 35 - 40 GB. (which is still a lot).
In the end, you're still going to need about 25 GB for English Wikipedia. If you want something smaller, you can try one of the other wikis (for example, Wiktionary, Wikiquote, Wikisource, etc.) Each of these are generally about 5 GB.
It would place more responsibility on the user to maintain their copy of the dump though.