Monday, 15 March 2010

Proposed Programs

Proposed Programs

Web crawler in Javascript

Javascript should be an excellent language to write a web crawler in. So why doesn't anyone do it? Is it too easy, too difficult or just too dull.

Well the Javascript web sites are full of scripts that solve problems that surely fall under at least one of those headings. Well leaving such mysteries aside I decided to try to create one. I should point out that I do have a sort of good reason besides intellectual curiosity: I want to make a search engine for my web site and I would like to ensure that it really does search the site and not just look up pages by keyword in a handmade index.

After some experimentation I have finally made a script that can crawl, here are the notes: JsCrawler.

VB to HTML

This isn't quite the same idea as most of the existing such programs. First of all this is to be a command line program so that I can automate it but also it has to take a VBP or VBG and generate a web of pages that will sit beside the originals.

In fact it is only the VBP files that need to be processed they can refer directly to the modules.

The visible text of the VBP should be exactly the same as the real VBP so that a simple select all and copy to clipboard will copy the content of the VBP.

Well it didn't turn out quite like that as you can see by browsing the code.

Media Partitioning

Burning CDs as a means of freeing space is a pain. Burn to The Brim isn't quite as easy as it ought to be and the only competitor I know of is not free.

So what should such a program do:

  • run from the command line
  • copy or move files as desired,
  • allow user to specify media size
  • allow for not splitting top level directories. This means that albums of music files can be kept together.
  • allow for limits on run time

What should it not do:

  • delete files

Website Partitioning

Free web hosts generally limit the size of the site that they will host. It oocurs to me that some of this could be circumvented, without violating the spirit of the restriction, by paritioning a web site into several disjoint pieces and having links between them. Then you could sign with a number of different hosts and put part of the site on each.

The problem is the management of the site source. It would obviously be most convenient to create the site as a single tree and not worry about which files should reside on which host then all internal links can be relative to the current file or absolute but with the same root.

What is needed is a program that can take such a site and distribute it automatically. This would mean that many references would have to be rewritten. Or is there another way to do it?

One way would be to always use ECMAScript links instead of static href links. Such links would have to have some way of discovering which host the file was on and would replace the current file with the appropriate file.

Possible methods:

Rewrite Links
run a process that takes the existing local static web site and rewrites some of the href anchors so that they become absolute references to another web site.
Intelligent Links
use ECMAScript instead of simple href links and have the script look up the correct address when the user clicks the link.
htaccess
use .htaccess to redirect requests for certain pages to another server.

Layout routines for C#

I heartily dislike drag and drop form designers. They seem so easy to use but as soon as you need any fine control they let you down. I always liked the simple containers that the Java user interface libraries have that automatically size their contents and arrange them in columns or rows.

So I have decide to create something like it in C#: CsharpLayout. I originally thought of creating new classes derived from Panel but I think that is is simpler to simply create some functions that take panels and other controls as argument.

Row
Add a list of controls to a panel so that they are in left to right order on the screen. Specify the panel first then the left most control, then the rest in order. The function searches the list of controls to discover which has dock=Fill. This must be either the first or the last. The controls are added to the Controls list of the panel either from left to right or from right to left depending on the location of the Fill control. Of course the other controls must have the appropriate DockStyle, Left or Right as appropriate.
Column
Ass Row except that where Row speaks of Left Column speaks of Top and so on.

Both functions return the panel as their value. This lets us write code that shows the hierrchical structure of the layout. You can then build up the layout piece by piece replacing simple elements with Rows or Columns as you go:

  Column(panelTop, splitterH, panelText);

You can compile it and check that it works. Now expand it by adding controls to the top panel:

   Column(Row(panelTop, picDotPlot, panelTopRight), 
          splitterH, 
          panelText);
<example>

The example is intended as preparation for the use of this idea in
DotPlot.

<example>
   Column(Row(panelTop, picDotPlot, 
              Column(panelTopRight, 
                     buttonGo,
                     frameDotScaling,
                     frameParsing), 
          splitterH, 
          Row(panelText, 
              Column(panelText1, comboFile1, textFile1), 
              Column(panelText2, comboFile2, textFile2));

All the examples assume that the various controls already exist and that all the relevant properties have been set.

FTP Sync

  • I have a web site.
  • It is always out of date.

Therefore I need a tool to synchronize the site with the local directory.

Write a VB program to scan the local directory tree and write a script for FTP. The script just includes:

  • log in
  • cd to root
  • dir
  • cd to each other directory that was found on the local disk
  • dir

The script will produce a list of all the files and directories on the server. Those files that exist in server directories that do not have counterparts on the local disk will not appear but the parent directories (or their ancestors) will.

Compare this list with the list of files and directories on the local disk and write new scripts to:

  • delete files and directories that do not belong,
  • to add missing files and directories,
  • to update out of date files.

This project actually exists, look at prjFTPSyncScript.vbp. It uses the Microsoft console based FTP program to do the actual work; this is the only example that I can think of of this kind of interaction between windows based and console based programs in MS Windows, of course such things are perfectly normal in Unix style operating systems.

To do:

  • command line version for unattended use

Dotplot

I have been familar with the basic dotplot idea for quite a while but for some reason that I can no longer recall I must have thought it too dificult to implement. In fact it is easy to create a small but useable dotplot application. Look at DotPlot for a discussion or dive straight in to dotplot.vbp for the first version. That one actually works, the latest and potentially greatest is current but it might not even compile.

No comments:

Post a Comment

Blog Archive

Followers