/ index
Wiki Blogger
The basic idea is simply to create a tool that will allow me to post my EmacsWiki to Blogger.
The purest way to do this would be to add to emacs-wiki-mode to allow publication directly to Blogger. Good idea except that I am a complete novice at writing Blogger API code and my elisp skills are very rusty (not that I was ever an expert). So to make the project more practical I'll use Python. In fact this has an advantage in that it will not be restricted to posting EmacsWiki pages but will be able to post arbitrary web pages.
I give up: Blogspot's API is unnecessarily complicated, Blogspot doesn't allow binary attachments, Blogspot doesn't honour my browser's language preference. So I'm going to try Posterous instead. The general requirements are the same but the API is significantly simpler being just HTTP Posts instead of Atom syndication.
So here are some notes on the Posterous API: WikiPosterous.
High level requirements
- Command line driven, no UI. This is to run as a CronJob,
- Accept a directory as the source of files to be posted,
- Only post files that are newer than the corresponding post on Blogger,
- Rewrite local links so that they still work.
- Set the dates on the blog pages so that newly changed entries appear at the front of the list. This means that the permalinks will change; which, in turn, means that pages that refer to that page will have to have their links updated, such pages are not themselves to be regarded as changed although they will have to be reposted. Or will it? Perhaps the postID is permanent. No, they are just the posting date and title so if the date is changed then so is the URL. Idiotic, the post has a unique ID so why can't that be used? Perhaps redirects can be used? Probably not because we cannot control the headers. At least not for individual posts. What about tinyurl type services? Are there any that let you create your own? Yes, http://purl.oclc.org/docs/help.html.
Questions
- Should the tool operate on only files in the given directory or should it also act on those found in subdirectories?
- Should it also post local files that are outside the specified directory tree but which are linked from the files that are posted. If it does how do we avoid privacy risks? If not, should it alter the links?
- If the blog settings specify No Archive does this mean that the URL of an entry depends only on the title? Does it also mean that changing the posting date will change the order? That is can we put an updated post at the top of the list without changing the URL?
- How do we know what the address of a page will be? Is it returned as part of the entry? It should be stored as part of the status along with the post id so that we can use it to rewrite the links.
Release 1
- Hard code wiki html source directory,
- Hard code blog name and password,
- Use status file,
- Do not attempt to put newer posts at the front.
Results
Some files cannot be uploaded because of problems with encoding utf8 characters. Perhaps I should run tidy on the files first.
File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 4187: unexpected code byte Uncaught exception. Entering post mortem debugging Running 'cont' or 'step' will restart the program > /usr/lib/python2.5/encodings/utf_8.py(16)decode() -> return codecs.utf_8_decode(input, errors, True)
Broadly speaking this now works. It uploads new pages and updates existing ones. So we can move on the release 2
Release 2
- Handle exception thrown when user deletes an entry that the status file thinks should be there,
- Set dates on updated pages to the modification time of the page
- deal with encoding errors by trapping the exception and ignoring it,
- Save the url of the post with the id in the state file.
- save url of new page in status file
Release 3
- Rewrite links.
- Strip html header and style information, send only content of body element.
- check for errors reported by Tidy, report to user.
Results
Now uploads Tidy-ed code, emacs-wiki header removed. Command line used to provide user name, password and source directory; this means that the script can be published.
Release 4
- Handle files that are referred to by the original files and upload them, rewriting links as necessary. How do I upload resources other than posts?
Unsorted future release features
Add entries here as we discover things to be done. Move entries from here to the appropriate release section as we go.
- Add labels, tags, categories. At least one label should be attached to each wiki entry as it is posted to identify it as an automatically posted entry so that such entries can be deleted en mass. This is already implemented but it appears to not work.
- add config file to remove user names and passwords from the script. Implemented getopts instead, launch from a Bash script.
- Add log file so that cron can be used.
- Use pygmentize to highlight non-html files found linked to the wiki. Use style=emacs.
- Include binary files like software distributions. Zero Install for instance. Unfortunately Blogspot will not allow upload of binaries other than some image formats. At least not interactively. Presumably the same rule will apply to programmatic uploads. Perhaps I need to investigate Google's web site hosting. Google Sites is a very restricted service, more like a wiki than a website. However it does allow attachments and these attachments get simple urls. So the solution to the upload problem can be to create a Google Site and add one or more empty pages that have attachments. Google Sites provides a page template called File Cabinet which looks like it is made for this purpose.
Source
Here is the script: wikiblogger.py
Blogger API links, etc.
http://code.google.com/apis/blogger/docs/2.0/developers_guide_protocol.html
http://code.google.com/apis/blogger/docs/1.0/developers_guide_python.html
No comments:
Post a Comment