Employing Perl and Typical Expressions to Process Html Files – Component two

In this short article we will discuss how to transform the contents of an HTML file by jogging a Perl script on it.

The file we are likely to method is termed file1.htm:

Notice: To make certain that the code is exhibited correctly, in the example code demonstrated in this short article, sq. brackets ‘[..]’ are utilized in HTML tags alternatively of angle brackets ”.

[html]
[head]Employing Perl and Typical Expressions to Process Html Files – Component 2Sample HTML File[/title]
[connection rel=”stylesheet” variety=”text/css” rel=”nofollow” onclick=”javascript:ga(‘send’, ‘pageview’, ‘/outgoing/short article_exit_connection/362029’)” href=”design.css”]
[/head]
[body]
[h1]Introduction[/h1]
[p]Welcome to the earth of Perl and normal expressions[/p]
[h2]Programming Languages[/h2]
[table border=”1″ width=”400″]
[tr][th colspan=”two”]Programming Languages[/th][/tr]
[tr][td]Language[/td][td]Normal use[/td][/tr]
[tr][td]JavaScript[/td][td]Client-side scripts[/td][/tr]
[tr][td]Perl[/td][td]Processing HTML documents[/td][/tr]
[tr][td]PHP[/td][td]Server-side scripts[/td][/tr]
[/table]
[h1]Summary[/h1]
[p]JavaScript, Perl, and PHP are all interpreted programming languages.[/p]
[/body]
[/html]

Picture that we need to have to transform each occurrences of [h1]heading[/h1] to [h1 course=”massive”]heading[/h1]. Not a massive transform and some thing that could be simply carried out manually or by carrying out a simple research and switch. But we’re just getting begun listed here.

To do this, we could use the next Perl script (script1.pl):

1 open (IN, “file1.htm”)
two open (OUT, “>new_file1.htm”)
three whilst ($line = [IN])
4 $line =~ s/[h1]/[h1 course=”massive”]/
five (print OUT $line)
6
7 near (IN)
8 near (OUT)

Notice: You you should not need to have to enter the line numbers. I’ve included them merely so that I can reference individual traces in the script.

Let us search at each individual line of the script.

Line 1
In this line file1.htm is opened so that it can be processed by the script. In order to method the file, Perl employs some thing termed a filehandle, which presents a variety of connection concerning the script and the functioning method, containing facts about the file that is staying processed. I’ve termed this “opening” filehandle ‘IN’, but I could have utilized everything in rationale. Filehandles are ordinarily in capitals.

Line two
This line produces a new file termed ‘new_file1.htm’, which is composed to by employing another filehandle, OUT. The ‘>’ just right before the filename indicates that the file will be composed to.

Line three
This line sets up a loop in which each individual line in file1.htm will be examined individually.

Line 4
This is the normal expression. It searches for 1 occurrence of [h1] on each individual line of file1.htm and, if it finds it, changes it to [h1 course=”massive”].

Seeking at Line 4 in far more detail:

    • $line – This is a variable that consists of a line of text. It gets modified if the substitution is productive.
    • =~ is termed the comparison operator.
    • s is the substitution operator.
    • [h1] is what wants to be substituted (changed).
    • [h1 course=”massive”] is what [h1] has to be altered to.

Line five
This line usually takes the contents of the $line variable and, via the OUT file take care of, writes the line to new_file1.htm.

Line 6
This line closes the ‘while’ loop. The loop is repeated until all the traces in file1.htm have been examined.

Traces 7 and 8
These two traces near the two file handles that have been utilized in the script. If you missed off these two traces the script would nonetheless perform, but it can be good programming observe to near file handles, as a result liberating up the file take care of names so they can be utilized, for example, by another file.

Managing the Script

As the intent of this short article is to make clear how to use normal expressions to method HTML documents, and not automatically how to use Perl, I you should not want to invest too extended describing how to operate Perl scripts. Suffice to say that you can operate them in different methods, for example, from in a text editor such as TextPad, by double-clicking the perl script (script1.pl), or by jogging the script from an MS-DOS window.

(The location of the Perl interpreter will need to have to be in your Path assertion so that you can operate Perl scripts from any location on your computer and not just from in the directory the place the interpreter (perl.exe) alone is put in.)

So, to operate our script we could open an MS-DOS window and navigate to the location the place the script and the HTML file are positioned. To hold daily life simple I’ve assumed that these two documents are in the identical folder (or directory). The command to operate the script is:

C:>perl script1.pl

If the script does perform (and ideally it will), a new file (new_file1.htm) is produced in the identical folder as file1.htm. If you open the file you are going to see the the two traces that contained [h1] tags have been modified so that they now browse [h1 course=”massive”].

In Component three we will search at how to take care of various documents.