Wednesday, 10 March 2010

Wiki Blogger

/ index

Wiki Blogger
High level requirements
Questions
Release 1
Release 2
Release 3
Unsorted future release features
Source
Blogger API links, etc.

Wiki Blogger

The basic idea is simply to create a tool that will allow me to post my EmacsWiki to Blogger.

The purest way to do this would be to add to emacs-wiki-mode to allow publication directly to Blogger. Good idea except that I am a complete novice at writing Blogger API code and my elisp skills are very rusty (not that I was ever an expert). So to make the project more practical I'll use Python. In fact this has an advantage in that it will not be restricted to posting EmacsWiki pages but will be able to post arbitrary web pages.

High level requirements

  • Command line driven, no UI. This is to run as a CronJob,
  • Accept a directory as the source of files to be posted,
  • Only post files that are newer than the corresponding post on Blogger,
  • Rewrite local links so that they still work.
  • Set the dates on the blog pages so that newly changed entries appear at the front of the list. This means that the permalinks will change; which, in turn, means that pages that refer to that page will have to have their links updated, such pages are not themselves to be regarded as changed although they will have to be reposted. Or will it? Perhaps the postID is permanent. No, they are just the posting date and title so if the date is changed then so is the URL. Idiotic, the post has a unique ID so why can't that be used? Perhaps redirects can be used? Probably not because we cannot control the headers. At least not for individual posts. What about tinyurl type services? Are there any that let you create your own? Yes, http://purl.oclc.org/docs/help.html.

Questions

  • Should the tool operate on only files in the given directory or should it also act on those found in subdirectories?
  • Should it also post local files that are outside the specified directory tree but which are linked from the files that are posted. If it does how do we avoid privacy risks? If not, should it alter the links?
  • If the blog settings specify No Archive does this mean that the URL of an entry depends only on the title? Does it also mean that changing the posting date will change the order? That is can we put an updated post at the top of the list without changing the URL?
  • How do we know what the address of a page will be? Is it returned as part of the entry? It should be stored as part of the status along with the post id so that we can use it to rewrite the links.

Release 1

  • Hard code wiki html source directory,
  • Hard code blog name and password,
  • Use status file,
  • Do not attempt to put newer posts at the front.

Results

Some files cannot be uploaded because of problems with encoding utf8 characters. Perhaps I should run tidy on the files first.

  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 4187: unexpected code byte
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /usr/lib/python2.5/encodings/utf_8.py(16)decode()
-> return codecs.utf_8_decode(input, errors, True)

Broadly speaking this now works. It uploads new pages and updates existing ones. So we can move on the release 2

Release 2

  • Handle exception thrown when user deletes an entry that the status file thinks should be there,
  • Set dates on updated pages to the modification time of the page
  • deal with encoding errors by trapping the exception and ignoring it,
  • Save the url of the post with the id in the state file.
  • save url of new page in status file

Release 3

  • Rewrite links.
  • Strip html header and style information, send only content of body element.

Unsorted future release features

Add entries here as we discover things to be done. Move entries from here to the appropriate release section as we go.

  • Add labels, tags, categories. At least one label should be attached to each wiki entry as it is posted to identify it as an automatically posted entry so that such entries can be deleted en mass.
  • add config file to remove user names and passwords from the script.

Source

Here is the script: WikiBloggerPy.

Blogger API links, etc.

No comments:

Post a Comment

Blog Archive

Followers