How to Build a Web Page Programmatically
by Fred Brack
Updated
If you are a webmaster who periodically creates or receives a file of data
that has to be integrated into a webpage, this is one person's experience with
how to do that. Here are the assumptions:
- You know how to write a computer program in some language (any
language).
- You have a web page which presents data in tabular or list format (in
other words, consistent from line to line).
- You receive or generate yourself a file of data whose contents determine
completely what will be listed on your page, other than introductory text.
In my case, I receive files of data each week in the form of an Excel
spreadsheet where each row represents a single video title (for example, a
movie) along with its characteristics (like genre and rating). From this
data, I build a webpage programmatically alphabetizing all the titles and adding
index letters (A, B, C, etc.). I get everything I need in the file -- I
just need to extract and manipulate the data, turning it into HTML.
An overview of how this is done follows:
- Build your web page with a sample line of data.
- Include the complete header, navigation, footer, and any introductory or
closing text you want.
- Insert "control" statements at any place that you want your program to
insert data. For example, the update date and number of titles listed,
and a second location where you actually want the title listing to begin.
- If the data comes to you in a spreadsheet, SAVE the data in ".csv"
format - Comma Separated Values. Excel will save the data in a file where
the data in each column of a row is written out followed by a comma. If the
data itself contains a comma, then the entire field will be in quotes. Your
program will extract the data from this file (since you can't process the
Excel spreadsheet directly).
- Write a program to do any preprocessing of your CSV (or other) file data
(for instance, validate it, sort it, and count valid entries); load up a
template consisting of that sample page your wrote up to the point of
your control statement; insert whatever is required at that pont; then
continue to load up the reset of the template.
- The end result is a complete HTML file with your data inserted.
Here is an example of a programmatically generated web page which started out
as an Excel worksheet:
iTunes Audio
Described Movie Titles. Most of the other title listings on that
website are generated similarly.
Considerations:
- I create the base HTML for a file using the same name as the online
filename, but suffixed by "base." Thus for my "itunesad.html" file
above, I call my base or template file, "itunesadbase.html."
It will be incorporated every time I rebuild my online file. (I set
its characteristic to "exclude from publishing" since it is not a real
online file.)
- I use
Microsoft Expression Web (an old but free product) to maintain
websites. One of its features is something called Dynamic Web Template
(DWT) which is a way to define a common format for each page in one place (a
.dwt file); then when any changes are made to that common format, propagate
it to all "connected" files. So after creating my base file, if there
are any changes to be made sitewide to the header or footer, I simply use
the option to "update all pages" which use the DWT, and my base file gets
updated. Subsequent builds incorporate these changes, and at the same
time, the current online file gets updated too, since it includes the DWT
from the base.
- In at least one case, I do not use an Excel file as the source of my
data. I created my own "database" of information as a flat file and
extract the fields from it in a manner similar to using a CSV file.
- While you can use any convention you like, I use the following
convention to designate where I am to interrupt the loading of the base file
to insert data:
<!-- CONTROL: name -->
where "name" is something like "DATE" or "DATA" or whatever you want.
I search for the complete line in my program to determine when to interrupt
the read-and-copy of the base file.
Your Program:
As a former IBMer, I choose to write in a programming language you are
unlikely to use: REXX (available free as
Regina Rexx). A
typical program to load data and format a web page is about 1100 lines. In
my case, a lot of that has to do with processing the genre of the movies,
handling exceptions such as correcting titles, and determing changes since last
time. Your program could definitely be shorter! The flow is
something like this:
- Validate the input files
- Read in all the data, processing each line for validity, and writing it
to a work file in a slightly different format
- Sort the work file by title so my output will be alphabetic
- Find additions since the last run so I can mark them in the new listing
(a bunch of code ...)
- Get the template (the base file)
- Insert information in two places; in my case this includes inserting
alphabetic breaks for A, B, C, etc.
- Complete the loading of the template
The resultant HTML file is ready for republishing to my website with a new
date noted, updated title count, and a section at top for "Recent Additions."
I hope this brief summary has given you ideas on how you can programmatically
create your own web pages. Drop me an email if this is helpful or you have
questions or suggestions for improvement.
Fred