At some point in the life of a static site, you will wish that you had used a content management system(CMS). The reasons could be varied.
It is surprising how often the implications of the last point are underestimated. The department complaining about the content being incorrect, the department responsible for providing the correct information are not necessarily the same. It is very hard to delegate responsibility for modification of static content to the concerned users. You, as the person accountable for maintaining the site, are stuck in the middle and it is a very frustrating experience.
As you explore the option of switching to a CMS, you will be estimating the effort of transferring the existing content into the new site. It is not an easy decision.
Should you modify the content during the migration? Obviously, the advantages of a CMS will be minimal unless you reorganise the content so that it fits the organisation's needs and the capability of the CMS well. The time involved in this activity is long – often the time required is in getting an agreement on a suitable design. Multiple models may need to be explored. Meanwhile, the backlog on the existing site continues.
What would be very nice is that if you could transfer the existing content painlessly into the CMS structure. Then you could replace the pages as and when required into the ideal or appropriate structure. The duration during which both the existing site would need to be maintained and the new site being created would be minimised.
The stakeholders would get a better feel of what the site would be like with their own content. The theme, the colour schemes can be decided while you build the new site. Switching a theme is not hard – at least until you have customised a theme substantially.
Drupal comes with an import HTML module which allows you to import an existing HTML site - see http://drupal.org/project/import_html. It provides a couple of sample and example templates for you to get started and, best of all, it is now available for Drupal7.
Try with a few pages first. Chances are that your imported content would look like such a mess that you will think that it is hopeless and want to give up.
The templates for transferring the content are in XSLT. It is bad enough to write XML by hand. Very few people I know are comfortable writing XSLT scripts. But persist. It is not as hard as it looks. The samples are sufficient to get started and you may need to make only some common changes.
Typically, a page will consist of header, footer, sidebars and content. You will want to manage the header, footer and sidebars using Drupal 7. Hence, you want to transfer the content from the static pages into content in Drupal 7 pages. Import_html will create reasonable menu links and preserve the existing page names. It will also import content like pdf files and images into a suitable folder.
Menu editing features are very nice in Drupal 7 and you can easily reorganise the menus once the content is meaningfully imported.
Unless your site is absolutely chaotic, you will find patterns. At worst, there may be a couple of sub-sites and you may need to create separate templates for each of them.
Start with the simplehtml2simplehtml.xsl. In it, you will find a template 'get-content'. Just below it, notice the lines -
This will match the html <h1> tag and ignore it in the conversion. You will understand soon as to why the h1 content is excluded.
So, the idea is simple. You need to identify elements which need to be ignored, retaining only the desired content. As an example,
match="xhtml:div[@id = 'header']">
match="xhtml:p[@id = 'breadcrumb']">
match="xhtml:ul[@class = 'navigation']">
You can select an html tag, e.g. a div, a paragraph or a list and narrow them by matching, say, the id or class. As you can see, it is pretty simple to skip any blocks in the original which are not a part of the core content. This matching has to be targeted at your specific site. Understandably, if you include all the blocks, the resulting site will show it all as a part of the main content and it will look a mess.
Finally, why did the default template skip the header element, which you will probably want to do as well? The reason is that the header is likely to be the title of your page. You may now look at the 'get-title' template in the xsl file. It is pretty readable.
If an h1 element is present, title of the page is set to the text content associated with that If not, it sets the title to the text associated with the title tag in the header.
It is possible that your site may have the site header as the h1 element. In that case, you may need to suitably modify the get-title template. Looking at the code above, it should not be hard to do.
Ideally, you should design the site afresh. However, web sites are now a critical resource for an organisation. Time is a luxury. “Good enough” and not perfection has to be your goal. There is a very high likelihood that using import html, you can obtain a 'good enough' site in a very short time, which makes it module well worth exploring.
Exploring Software >