Http原理及PHP中cURL的使用

  为了给接下来的教程做好铺垫,本文将讲述如何用PHP发出Http请求进行模拟登录,顺带会讲一些Http请求原理。模拟登录…就是模拟浏览器登录嘛,所谓请求,只不过是你向网站发一些字,网站又给你回复一些字,这一般都是基于Http或Https协议的。平时是浏览器帮我们做好了这些工作,封装数据发送到指定网站,然后接收,最后编译成网页显示出来。在模拟登录中,呵呵,这些都要我们自己做,只是最后不用编译…只要提取到需要的数据就行了。

  PHP中模拟登录有三种方式。第一是直接用file_get_contens(网站)这个函数,这个..用起来很简单,不说了;第二种是用socket,按照套接字的规定把要发送的字符一个个打上去,再发出去,这个..没多研究,也不说了;最后就当然是用PHP自带的CURL工具了。这个工具可以根据不同的需求,设置消息包头信息、发送字流等等,也很方便。至于Http数据包的格式是怎么样的,这是Http协议的基本内容,不多说。下面用CURL模拟发起一次对百度的请求:

1   $curl = curl_init()        //初始化实例
2     curl_setopt($curl, CURLOPT_URL, http://www.baidu.com)        //设置URL地址
3     curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 5);        //5秒连接超时
4     curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);        //设为1返回Http响应结果
5     //伪造客户端,最好设一下,有些网站会根据客户端来阻隔请求的
6     curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);        
7     $response = curl_exec($curl);        //curl执行http请求,响应存到$response变量中
8     $state = curl_getinfo($curl, CURLINFO_HTTP_CODE);        //可以用这句来获取响应的状态码
9     curl_close($curl);        //释放curl资源

 

  至此一次请求就完成了,$response变量是响应结果,也就是百度页面的html源码(字符串)。Http请求中有两种请求方式,一是GET,另一是POST(具体的看Http原理去),以上对百度的是GET方法,POST方法不同就在于,要把参数作为消息报内容发送出去,参数流按照Http协议的规定,p1=v1&p2=v2&p3=v3…,p是参数名v是值,搞不清楚的看懂Http原理再接着看。记参数为$param变量(字符串),那就

         curl_setopt($curl, CURLOPT_POSTFIELDS, $param);

         如果要设置请求头部:

         curl_setopt($curl, CURLOPT_HTTPHEADER, $header);

         其中$header为数组类型,比如要写入CLIENT-IP和X_FORWARDED-FOR这两个头信息,那就$header = array(‘CLIENT-IP: ‘=>‘value‘, ‘X-FORWARDED-FOR: ‘=>‘value‘).

         curl还有很多CURLOPT预设值给curl_setopt使用,具体的我不写出来了..自己找吧

         接下来,既然curl已经知道怎么用了,能不能用curl写一个模拟登录的工具类呢?

         我把这个类叫RequestClient,一般请求关系到三方面:url地址、请求方法、请求参数,请求头部可要可不要,所以也写下去;至于接收到的响应,就取响应数据报、状态码。综上,定义这个类的成员变量:  

    private $response = null;            
    private $url;                    
    private $header = null;            
    private $parameter = null;        
    private $method = ‘GET‘;        //默认使用GET方法请求
    private $state = null;

   

  实例化时要指定url,也可以通过set的方式设定

    public function __construct($url) {
        $this->url = $url;
    }
    public function setUrl($url) {
        $this->url = $url;
    }

       

   Header的setter($header按照上面提到的格式):

  public function setHeader($header) {
      $this->header = $header;
  }

        

   以及各种getter:

    public function getUrl() { return $this->url; }
    public function getParameter() { return $this->parameter; }
    public function getHeader() { return $this->header; }public function getMethod() { return $this->method; }
    public function getState() { return $this->state; }
    public function getResponse() { return $this->response; }

   

  接下来设置参数了,设置参数有两种方式,一是通过传递数组,再把数据信息转化为参数字符串,二是直接传递字符串,数组格式为array(“p1”=>”value1”, “p2”=>”value2”…),encode可选择是否对参数进行url编码(默认是)

 1   public function setParameter($parameter = null, $encode = true) {
 2         if (is_array($parameter)) {    
 3             $temp = ‘‘;
 4             if ($encode) {
 5                 foreach ($parameter as $key => $value) {
 6                     $temp .= "$key=".urlencode($value) ."&";    
 7                 }                
 8        } else {
 9                 foreach ($parameter as $key => $value) {
10                     $temp .= "$key=$value&";
11                 }
12             }
13             $this->parameter = substr($temp, 0, -1);
14         } elseif (is_string($parameter)) {
15             $this->parameter = $parameter;
16         }
17     }

 

  下面是get和post方法,模拟发出get、post请求,响应报文放在$response,状态码放在$state

 1   public function get($timeout=5) {
 2         $this->method = ‘GET‘;
 3         if ($this->parameter != null) {        //get在有参数的情况下,把参数附在url上
 4             $this->url .= (‘?‘.$this->parameter);
 5      }
 6         $curl = curl_init();
 7         curl_setopt($curl, CURLOPT_URL, $this->url);
 8         curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 9         curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
10         if ($this->header!=null) {            //有头信息时才设置
11             curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);
12         }
13         curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);
14         $this->response = curl_exec($curl);
15         $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);
16         curl_close($curl);
17         return $this->response;
18     }
19 
20     public function post($timeout=5) {
21         $this->method = ‘POST‘;
22         $curl = curl_init();
23         curl_setopt($curl, CURLOPT_URL, $this->url);
24         curl_setopt($curl, CURLOPT_HEADER, 1);
25         curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
26         curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
27         curl_setopt($curl, CURLOPT_POSTFIELDS, $this->parameter);
28         if ($this->header!=null) {
29             curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);
30         }
31         curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);
32         $this->response = curl_exec($curl);
33         $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);
34         curl_close($curl);
35         return $this->response;
36     }

 

  这样,一个用于Http请求的类就基本完成了,但是根据实际情况,对不同的需求可以提供不同的功能,比如说获取网页的标题(<title>的内容):

1     public function getTitle() {
2         $source = $this->response;
3         $start = stripos($source, ‘<title‘);
4         $source = substr($source, $start);
5         $start = stripos($source, ‘>‘) + 1;
6         $end = stripos($source, ‘<‘, $start);
7         return substr($source, $start, $end-$start);
8     }

 

  获取cookie返回字符串(CURL提供了一个获取Cookie很方便快捷的方法,在setopt中用CURLOPT_COOKIEJAR和CURLOPT_COOKIE变量获取就可以了,Cookie信息会写在指定的文件中,发出请求时直接调用这个文件上传就可以了,但是由于个人习惯,我还是喜欢把cookie当字符串提取出来,设置在$header头信息的Cookie中,这样比较灵活吧,以下函数就是把cookie串提取出来,以[cookie1=value1; cookie2=value2; …]这个格式返回string):

 1     public function getCookie() {
 2             $content = $this->response;        //$response中包含响应头信息
 3         $start = 0;
 4         $rt = ‘‘;
 5         while (($start = stripos($content, ‘Set-Cookie: ‘, $start)) != false) {    //不断搜索’Set-Cookie’字段
 6             $start += 12;        //从$start位置开始忽略Set-Cookie这12个字符
 7             $end = stripos($content, ‘;‘, $start);
 8             $rt .= substr($content, $start, $end-$start).‘; ‘;
 9         }
10         return substr($rt, 0, -2);        //丢掉最后的分号和空格
11     }

  调用时就是

           $client = new RequestClient(“这里是网址”);

           $client->setHeader(头信息);

           $client->setParameter(参数);

           $client->get()   或者       $client->post();

  至此这个类就完成了。最后要说的一点是,这个封装功能的思路和代码实现毕竟都是我凭经验总结出来的,不免会有一点差错或者有点不完善。总之就是,在实际应用中要根据自己的需求改善,增加一些功能,更好地去适应自己的程序。

 

  最后的完整代码:

  1 <?php 
  2     class RequestClient {
  3         private $response = null;            
  4         private $url;                    
  5         private $header = null;            //type: array
  6         private $parameter = null;        //type String
  7         private $proxy = null;            //代理
  8         private $method = ‘GET‘;        //default GET method
  9         private $state = null;
 10 
 11 
 12         // a static function to create a new object with parameters url, parameters, and cookie(path)
 13         public static function newClient($url, $parameter=null, $header=null) {
 14             $client = new RequestClient($url);
 15             $client->setParameter($parameter);
 16             $client->setHeader($header);
 17             return $client;
 18         }
 19 
 20         // constructor, with a only parameter url
 21         public function __construct($url) {
 22             $this->url = $url;
 23         }
 24 
 25         public function __destruct() {
 26             $this->clear();
 27         }
 28 
 29         // setter
 30         public function setUrl($url) {
 31             $this->url = $url;
 32         }
 33 
 34         public function setHeader($header) {
 35             $this->header = $header;
 36         }
 37 
 38         public function setProxy($proxy) {
 39             $this->proxy = $proxy;
 40         }
 41 
 42         public function getCookie() {
 43             $content = $this->response;
 44             $start = 0;
 45             $rt = ‘‘;
 46             while (($start = stripos($content, ‘Set-Cookie: ‘, $start)) != false) {
 47                 $start += 12;
 48                 $end = stripos($content, ‘;‘, $start);
 49                 $rt .= substr($content, $start, $end-$start).‘; ‘;
 50             }
 51             return substr($rt, 0, -2);
 52         }
 53 
 54         
 55         public function setParameter($parameter = null, $encode = true) {
 56             if (is_array($parameter)) {    //change to ‘string‘ if the type is ‘array‘
 57                 $temp = ‘‘;
 58                 if ($encode) {
 59                     foreach ($parameter as $key => $value) {
 60                         $temp .= "$key=".urlencode($value) ."&";    //change to string
 61                     }
 62                 } else {
 63                     foreach ($parameter as $key => $value) {
 64                         $temp .= "$key=$value&";
 65                     }
 66                 }
 67                 $this->parameter = substr($temp, 0, -1);
 68             } elseif (is_string($parameter)) {
 69                 $this->parameter = $parameter;
 70             }
 71         }
 72 
 73         // request in method ‘GET‘, set the response content to $this->reponse and return it
 74         public function get($timeout=5) {
 75             $this->method = ‘GET‘;
 76             $this->handleParameter();
 77             $curl = curl_init();
 78             curl_setopt($curl, CURLOPT_URL, $this->url);
 79             curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
 80             curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
 81             if ($this->header!=null) {
 82                 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);
 83             }
 84             if ($this->proxy!=null) {
 85                 curl_setopt($curl, CURLOPT_PROXY, $this->proxy);
 86             }
 87             curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);
 88             $this->response = curl_exec($curl);
 89             $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);
 90             curl_close($curl);
 91             return $this->response;
 92         }
 93 
 94         // request in method ‘POST‘, set the response content to $this->reponse and return it
 95         public function post($timeout=5) {
 96             $this->method = ‘POST‘;
 97             $curl = curl_init();
 98             curl_setopt($curl, CURLOPT_URL, $this->url);
 99             curl_setopt($curl, CURLOPT_HEADER, 1);
100             curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
101             curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
102             curl_setopt($curl, CURLOPT_POSTFIELDS, $this->parameter);
103             if ($this->header!=null) {
104                 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header);
105             }
106             if ($this->proxy!=null) {
107                 curl_setopt($curl, CURLOPT_PROXY, $this->proxy);
108             }
109             curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘);
110             $this->response = curl_exec($curl);
111             $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE);
112             curl_close($curl);
113             return $this->response;
114         }
115 
116         // get the title
117         public function getTitle() {
118             $source = $this->response;
119             $start = stripos($source, ‘<title‘);
120             $source = substr($source, $start);
121             $start = stripos($source, ‘>‘) + 1;
122             $end = stripos($source, ‘<‘, $start);
123             return substr($source, $start, $end-$start);
124         }
125 
126         // reset state of the object, only url remain
127         public function clear() {
128             $this->parameter = null;
129             $this->header = null;
130             $this->response = null;
131             $this->proxy = null;
132             $this->method = ‘GET‘;
133         }
134 
135         // getter
136         public function getUrl() { return $this->url; }
137         public function getParameter() { return $this->parameter; }
138         public function getHeader() { return $this->header; }
139         public function getProxy() { return $this->proxy; }
140         public function getMethod() { return $this->method; }
141         public function getState() { return $this->state; }
142         public function getResponse() { return $this->response; }
143 
144         // private function, mix the parameter with url if the method is ‘GET‘
145         private function handleParameter() {
146             if ($this->parameter != null) {
147                 if ($this->method == ‘GET‘) {
148                     $this->url .= (‘?‘.$this->parameter);
149                 } 
150             }
151         }
152     }
153 ?>

Http原理及PHP中cURL的使用,古老的榕树,5-wow.com

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。