In this short article we will go over how to change the contents of an HTML file by managing a Perl script on it.
The file we are going to method is referred to as file1.htm:
Notice: To be certain that the code is displayed correctly, in the illustration code proven in this short article, sq. brackets ‘[..]’ are made use of in HTML tags as an alternative of angle brackets ”.
[head]Applying Perl and Normal Expressions to Course of action Html Information – Part 2Sample HTML File[/title]
[p]Welcome to the planet of Perl and common expressions[/p]
[desk border=”one” width=”400″]
[tr][th colspan=”2″]Programming Languages[/th][/tr]
[tr][td]Perl[/td][td]Processing HTML files[/td][/tr]
Picture that we need to have to change equally occurrences of [h1]heading[/h1] to [h1 class=”large”]heading[/h1]. Not a large change and anything that could be effortlessly done manually or by undertaking a basic search and switch. But we are just obtaining began below.
To do this, we could use the adhering to Perl script (script1.pl):
one open (IN, “file1.htm”)
2 open (OUT, “>new_file1.htm”)
three although ($line = [IN])
four $line =~ s/[h1]/[h1 class=”large”]/
5 (print OUT $line)
7 shut (IN)
eight shut (OUT)
Notice: You never need to have to enter the line quantities. I’ve included them simply just so that I can reference unique strains in the script.
Let’s look at each and every line of the script.
In this line file1.htm is opened so that it can be processed by the script. In get to method the file, Perl utilizes anything referred to as a filehandle, which supplies a kind of hyperlink amongst the script and the running program, made up of information about the file that is staying processed. I’ve referred to as this “opening” filehandle ‘IN’, but I could have made use of something in just purpose. Filehandles are commonly in capitals.
This line results in a new file referred to as ‘new_file1.htm’, which is published to by using an additional filehandle, OUT. The ‘>’ just just before the filename suggests that the file will be published to.
This line sets up a loop in which each and every line in file1.htm will be examined separately.
This is the common expression. It lookups for a person event of [h1] on each and every line of file1.htm and, if it finds it, alterations it to [h1 class=”large”].
On the lookout at Line four in a lot more depth:
- $line – This is a variable that incorporates a line of text. It receives modified if the substitution is prosperous.
- =~ is referred to as the comparison operator.
- s is the substitution operator.
- [h1] is what demands to be substituted (replaced).
- [h1 class=”large”] is what [h1] has to be adjusted to.
This line requires the contents of the $line variable and, via the OUT file take care of, writes the line to new_file1.htm.
This line closes the ‘while’ loop. The loop is repeated until all the strains in file1.htm have been examined.
Strains 7 and eight
These two strains shut the two file handles that have been made use of in the script. If you skipped off these two strains the script would nevertheless function, but it can be good programming apply to shut file handles, so releasing up the file take care of names so they can be made use of, for illustration, by an additional file.
Managing the Script
As the objective of this short article is to explain how to use common expressions to method HTML files, and not necessarily how to use Perl, I never want to devote far too prolonged describing how to run Perl scripts. Suffice to say that you can run them in several ways, for illustration, from in just a text editor these kinds of as TextPad, by double-clicking the perl script (script1.pl), or by managing the script from an MS-DOS window.
(The spot of the Perl interpreter will need to have to be in your Path statement so that you can run Perl scripts from any spot on your computer and not just from in just the directory wherever the interpreter (perl.exe) itself is set up.)
So, to run our script we could open an MS-DOS window and navigate to the spot wherever the script and the HTML file are found. To maintain lifetime basic I’ve assumed that these two files are in the similar folder (or directory). The command to run the script is:
If the script does function (and hopefully it will), a new file (new_file1.htm) is made in the similar folder as file1.htm. If you open the file you can see the the two strains that contained [h1] tags have been modified so that they now examine [h1 class=”large”].
In Part three we will look at how to take care of a number of files.