Get remote HTML with PHP

This piece of code retrieves the HTML code from a page, which you can then process and manipulate as you see fit. In the below example, the source code of my Last.fm page is retrieved.

<?php
// This variable will hold the code
$content = "";

// You have to divide host and the rest of the url,
// so in this example, the full url you wanted to get
// would be http://www.last.fm/user/CaseuS_
// As you can see, you don't have to specify the
// protocol used.
$host = "www.last.fm";
$url = "/user/CaseuS_";

// Open the connection on port 80
// $errno and $errstr can be used to
// get the error number and error message
// in case something went wrong.
// 30 stands for the timeout, in seconds,
// that we use for the connection attempt.
$fp = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fp)
{
    echo "$errstr ($errno)<br />\n";
}
else
{
    $out = "GET $url HTTP/1.1\r\n";
    $out .= "Host: $host\r\n";
    $out .= "Connection: Close\r\n\r\n";
    fwrite($fp, $out);
    while (!feof($fp))
    {
        $content .= fgets($fp, 128);
    }
    fclose($fp);
}

// Do something with $content, e.g. output the whole code:
echo $content;
?>

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>