Crawl

Technology

madhurikad13
  • 1. Crawling the WebWeb pages•Few thousand characters long•Served through the internet using the hypertexttransport protocol (HTTP)•Viewed at client end using `browsers’Crawler•To fetch the pages to the computer•At the computerAutomatic programs can analyze hypertextdocuments
  • 2. HTML HyperText Markup Language Lets the author• specify layout and typeface• embed diagrams• create hyperlinks. expressedas an anchor tag with a HREF attribute HREF names another page using a UniformResource Locator (URL),• URL = protocolfield (“HTTP”) + a server hostname (“www.cse.iitb.ac.in”) + file path (/, the `root of the published file system).Mining the WebChakrabarti and Ramakrishnan2
  • 3. HTTP(hypertext transport protocol) Built on top of the Transport Control Protocol(TCP) Steps(from client end) • resolve the server host name to an Internet address (IP) Use Domain Name Server (DNS) DNS is a distributed database of name-to-IP mappingsmaintained at a set of known servers• contact the server using TCP connect to default HTTP port (80) on the server. Enter the HTTP requests header (E.g.: GET) Fetch the response header– MIME (Multipurpose Internet Mail Extensions)– A meta-data standard for email and Web content transferMining the Web Chakrabarti and Ramakrishnan 3Fetch the HTML page
  • 4. Crawl “all” Web pages? Problem: no catalog of all accessible URLson the Web. Solution:• start from a given set of URLs• Progressively fetch and scan them for newoutlinking URLs• fetch these pages in turn…..• Submit the text in page to a text indexingsystem• and so on……….Mining the WebChakrabarti and Ramakrishnan 4
  • 5. Crawling procedure Simple• Great deal of engineering goes into industry-strength crawlers• Industry crawlers crawl a substantial fractionof the Web •E.g.: Alta Vista, Northern Lights, Inktomi No guarantee that all accessible Webpages will be located in this fashion Crawler may never halt ……. • pages will be added continually even as it isrunning.Mining the WebChakrabarti and Ramakrishnan 5
  • 6. Crawling overheads Delays involved in• Resolving the host name in the URL to an IPaddress using DNS• Connecting a socket to the server and sendingthe request •Receiving the requested page in response Solution: Overlap the above delays by • fetching many pages at the same timeMining the Web Chakrabarti and Ramakrishnan 6
  • 7. Anatomy of a crawler. Page fetching threads• Starts with DNS resolution• Finishes when the entire page has been fetched Each page• stored in compressed form to disk/tape• scanned for outlinks Work pool of outlinks• maintain network utilization without overloading it Dealt with by load manager Continue till he crawler has collected aMining the WebChakrabarti and Ramakrishnan 7
  • 8. Typical anatomy of a large-scale crawler.Mining the Web Chakrabarti and Ramakrishnan8
  • 9. Large-scale crawlers: performanceand reliability considerationsNeed to fetch many pages at same time• utilize the network bandwidth• single page fetch may involve several seconds of network latency Highly concurrent and parallelized DNS lookups Use of asynchronous sockets• Explicit encoding of the state of a fetch context in a data structure • Polling socket to check for completion of network transfers • Multi-processing or multi-threading: Impractical Care in URL extraction • Eliminating duplicates to reduce redundant fetchesMining • Avoiding “spider Chakrabarti”and Ramakrishnan the Web traps9
  • 10. DNS caching, pre-fetching andresolution A customized DNS component with…..1. Custom client for address resolution2. Caching server3. Prefetching clientMining the Web Chakrabarti and Ramakrishnan 10
  • 11. Custom client for address resolution Tailored for concurrent handling ofmultiple outstanding requests Allows issuing of many resolution requeststogether• polling at a later time for completion of individual requests Facilitates load distribution among manyDNS servers.Mining the Web Chakrabarti and Ramakrishnan 11
  • 12. Caching server With a large cache, persistent across DNSrestarts Residing largely in memory if possible.Mining the Web Chakrabarti and Ramakrishnan 12
  • 13. Prefetching client• Steps1. Parse a page that has just been fetched2. extract host names from HREF targets3. Make DNS resolution requests to the caching server• Usually implemented using UDP• User Datagram Protocol• connectionless, packet-based communication protocol•does not guarantee packet delivery• Does not wait for resolution to becompleted.Mining the WebChakrabarti and Ramakrishnan 13
  • 14. Multiple concurrent fetches• Managing multiple concurrentconnections• A single download may take several seconds• Open many socket connections to different HTTP servers simultaneously• Multi-CPU machines not useful• crawling performance limited by network and disk• Two approaches1. using multi-threading2. using non-blocking sockets with eventMining the WebChakrabarti and Ramakrishnan 14
  • 15. Multi-threading• logical threads • physical thread of control provided by the operating system (E.g.: pthreads) OR • concurrent processes• fixed number of threads allocated in advance• programming paradigm • create a client socket • connect the socket to the HTTP service on a server • Send the HTTP request header • read the socket (recv) until•no more characters are available • close the socket.• use blocking system callsMining the Web Chakrabarti and Ramakrishnan 15
  • 16. Multi-threading: Problems• performance penalty • mutual exclusion • concurrent access to data structures• slow disk seeks. • great deal of interleaved, random input-outputon disk• Due to concurrent modification of documentrepository by multiple threadsMining the WebChakrabarti and Ramakrishnan 16
  • 17. Non-blocking sockets and eventhandlers• non-blocking sockets • connect, send or recv call returns immediately without waiting for the network operation to complete. • poll the status of the network operation separately• “select” system call • lets application suspend until more data can be read from or written to the socket•timing out after a pre-specified deadline•Monitor polls several sockets at the same time• More efficient memory management• code that completes processing not interrupted by other completions• No need for locks and semaphores on the poolMining the Web Chakrabarti and Ramakrishnan 17
  • 18. Link extraction and normalization• Goal: Obtaining a canonical form of URL• URL processing and filtering• Avoid multiple fetches of pages known bydifferent URLs• many IP addresses•For load balancing on large sites• Mirrored contents/contents on same file system•“Proxy pass“• Mapping of different host names to a single IP address• need to publish many logical sites• Relative URLs•need to be interpreted w.r.t to a base URL.Mining the WebChakrabarti and Ramakrishnan 18
  • 19. Canonical URL Formed by• Using a standard string for the protocol• Canonicalizing the host name• Adding an explicit port number• Normalizing and cleaning up the pathMining the Web Chakrabarti and Ramakrishnan 19
  • 20. Robot exclusion• Check• whether the server prohibits crawling anormalized URL• In robots.txt file in the HTTP root directory ofthe server•species a list of path prefixes which crawlers should not attempt to fetch.• Meant for crawlers onlyMining the Web Chakrabarti and Ramakrishnan 20
  • 21. Eliminating already-visited URLs Checking if a URL has already been fetched• Before adding a new URL to the work pool• Needs to be very quick.• Achieved by computing MD5 hash function on the URL Exploiting spatio-temporal locality of accessTwo-level hash function. – most significant bits (say, 24) derived by hashing the host name plus port – lower order bits (say, 40) derived by hashing the pathconcatenated bits use d as a key in a B-tree qualifying URLs added to frontier of the crawl. hash values added to B-tree.Mining the WebChakrabarti and Ramakrishnan 21
  • 22. Spider traps Protecting from crashing on• Ill-formed HTML E.g.: page with 68 kB of null characters• Misleading sites indefinite number of pages dynamically generatedby CGI scripts paths of arbitrary depth created using softdirectory links and path remapping features inHTTP serverMining the WebChakrabarti and Ramakrishnan 22
  • 23. Spider Traps: Solutions No automatic technique can be foolproof Check for URL length Guards• Preparing regular crawl statistics• Adding dominating sites to guard module• Disable crawling active content such as CGIform queries• Eliminate URLs with non-textual data typesMining the Web Chakrabarti and Ramakrishnan23
  • 24. Avoiding repeated expansion of links on duplicate pages Reduce redundancy in crawls Duplicate detection• Mirrored Web pages and sites Detecting exact duplicates• Checking against MD5 digests of stored URLs• Representing a relative link v(relativetoaliasesu1and u2)as tuples (h(u1);v) and (h(u2);v) Detecting near-duplicates• Even a single altered character will completely change the digest !E.g.: date of update/ name and email of the site administrator• Solution : Shingling and RamakrishnanMining the Web Chakrabarti24
  • 25. Load monitor Keeps track of various system statistics• Recent performance of the wide area network (WAN) connectionE.g.: latency and bandwidth estimates.• Operator-provided/estimated upper bound on open sockets for a crawler•Current number of active sockets.Mining the WebChakrabarti and Ramakrishnan 25
  • 26. Thread manager Responsible for Choosing units of work from frontier Scheduling issue of network resources Distribution of these requests over multiple ISPs if appropriate. Uses statistics from load monitorMining the Web Chakrabarti and Ramakrishnan26
  • 27. Per-server work queues Denial of service (DoS) attacks limit the speed or frequency of responses to any fixed client IP address Avoiding DOS limit the number of active requests to a givenserver IP address at any time maintain a queue of requests for each serverUse the HTTP/1.1 persistent socket capability. Distribute attention relatively evenly between a large number of sites Access locality vs. politeness dilemmaMining the WebChakrabarti and Ramakrishnan27
  • 28. Text repository Crawler’s last task  Dumping fetched pages into a repository Decoupling crawler from other functionsfor efficiency and reliability preferred Page-related information stored in twoparts  meta-data  page contents.Mining the Web Chakrabarti and Ramakrishnan 28
  • 29. Storage of page-related information Meta-data relational in natureusually managed by custom software to avoid relation database system overheadstext index involves bulk updates includes fields like content-type, last-modified date, content-length, HTTP status code, etc.Mining the Web Chakrabarti and Ramakrishnan29
  • 30. Page contents storage Typical HTML Web page compresses to 2-4 kB (using zlib) File systems have a 4-8 kB file block size Too large !! Page storage managed by custom storagemanager simple access methods forcrawler to add pagesSubsequent programs (Indexer etc) to retrieve documentsMining the WebChakrabarti and Ramakrishnan 30
  • 31. Page Storage Small-scale systems Repository fitting within the disks of a singlemachine Use of storage manager (E.g.: Berkeley DB)Manage disk-based databases within a single fileconfiguration as a hash-table/B-tree for URL access key To handle ordered access of pagesconfiguration as a sequential log of page records. Since Indexer can handle pages in any orderMining the Web Chakrabarti and Ramakrishnan 31
  • 32. Page Storage Large Scale systems Repository distributed over a number ofstorage servers Storage serversConnected to the crawler through a fast local network (E.g.: Ethernet)Hashed by URLs `T3 grade leased lines.To handle 10 million pages (40 GB) per hourMining the WebChakrabarti and Ramakrishnan 32
  • 33. Large-scale crawlers often use multiple ISPs and a bank of local storage servers to store the pages crawled.Mining the WebChakrabarti and Ramakrishnan 33
  • 34. Refreshing crawled pages Search engines index should be fresh Web-scale crawler never `completes its job High variance of rate of page changes “If-modified-since” request header withHTTP protocol Impractical for a crawler Solution At commencement of new crawling round estimate which pages have changedMining the WebChakrabarti and Ramakrishnan 34
  • 35. Determining page changes “Expires” HTTP response header  For page that come with an expiry date Otherwise need to guess if revisiting thatpage will yield a modified version.  Score reflecting probability of page beingmodified Crawler fetches URLs in decreasing order ofscore. Assumption : recent past predicts the futureMining the WebChakrabarti and Ramakrishnan 35
  • 36. Estimating page change rates Brewington and Cybenko & Cho Algorithms for maintaining a crawl in which most pages are fresher than a specified epoch. Prerequisite average interval at which crawler checks for changes is smaller than the inter-modification times of a page Small scale intermediate crawler runs to monitor fast changing sitesE.g.: current news, weather, etc. Patched intermediate indices into master indexMining the WebChakrabarti and Ramakrishnan 36
  • 37. Putting together a crawler Reference implementation of the HTTP client protocolWorld-wide Web Consortium (http://www.w3c.org/ )w3c-libwww packageMining the Web Chakrabarti and Ramakrishnan 37
  • 38. Design of the core components: Crawler class. To copy bytes from network sockets to storagemedia Three methods to express Crawlers contractwith user pushing a URL to be fetched to the Crawler(fetchPush) Termination callback handler (fetchDone) called withsame URL Method (start) which starts Crawlers event loop. Implementation of Crawler class  Need for two helper classes called DNS and FetchMining the WebChakrabarti and Ramakrishnan 38
    Please download to view
  • 38
    All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
    Description
     
    Text
    • 1. Crawling the WebWeb pages•Few thousand characters long•Served through the internet using the hypertexttransport protocol (HTTP)•Viewed at client end using `browsers’Crawler•To fetch the pages to the computer•At the computerAutomatic programs can analyze hypertextdocuments
  • 2. HTML HyperText Markup Language Lets the author• specify layout and typeface• embed diagrams• create hyperlinks. expressedas an anchor tag with a HREF attribute HREF names another page using a UniformResource Locator (URL),• URL = protocolfield (“HTTP”) + a server hostname (“www.cse.iitb.ac.in”) + file path (/, the `root of the published file system).Mining the WebChakrabarti and Ramakrishnan2
  • 3. HTTP(hypertext transport protocol) Built on top of the Transport Control Protocol(TCP) Steps(from client end) • resolve the server host name to an Internet address (IP) Use Domain Name Server (DNS) DNS is a distributed database of name-to-IP mappingsmaintained at a set of known servers• contact the server using TCP connect to default HTTP port (80) on the server. Enter the HTTP requests header (E.g.: GET) Fetch the response header– MIME (Multipurpose Internet Mail Extensions)– A meta-data standard for email and Web content transferMining the Web Chakrabarti and Ramakrishnan 3Fetch the HTML page
  • 4. Crawl “all” Web pages? Problem: no catalog of all accessible URLson the Web. Solution:• start from a given set of URLs• Progressively fetch and scan them for newoutlinking URLs• fetch these pages in turn…..• Submit the text in page to a text indexingsystem• and so on……….Mining the WebChakrabarti and Ramakrishnan 4
  • 5. Crawling procedure Simple• Great deal of engineering goes into industry-strength crawlers• Industry crawlers crawl a substantial fractionof the Web •E.g.: Alta Vista, Northern Lights, Inktomi No guarantee that all accessible Webpages will be located in this fashion Crawler may never halt ……. • pages will be added continually even as it isrunning.Mining the WebChakrabarti and Ramakrishnan 5
  • 6. Crawling overheads Delays involved in• Resolving the host name in the URL to an IPaddress using DNS• Connecting a socket to the server and sendingthe request •Receiving the requested page in response Solution: Overlap the above delays by • fetching many pages at the same timeMining the Web Chakrabarti and Ramakrishnan 6
  • 7. Anatomy of a crawler. Page fetching threads• Starts with DNS resolution• Finishes when the entire page has been fetched Each page• stored in compressed form to disk/tape• scanned for outlinks Work pool of outlinks• maintain network utilization without overloading it Dealt with by load manager Continue till he crawler has collected aMining the WebChakrabarti and Ramakrishnan 7
  • 8. Typical anatomy of a large-scale crawler.Mining the Web Chakrabarti and Ramakrishnan8
  • 9. Large-scale crawlers: performanceand reliability considerationsNeed to fetch many pages at same time• utilize the network bandwidth• single page fetch may involve several seconds of network latency Highly concurrent and parallelized DNS lookups Use of asynchronous sockets• Explicit encoding of the state of a fetch context in a data structure • Polling socket to check for completion of network transfers • Multi-processing or multi-threading: Impractical Care in URL extraction • Eliminating duplicates to reduce redundant fetchesMining • Avoiding “spider Chakrabarti”and Ramakrishnan the Web traps9
  • 10. DNS caching, pre-fetching andresolution A customized DNS component with…..1. Custom client for address resolution2. Caching server3. Prefetching clientMining the Web Chakrabarti and Ramakrishnan 10
  • 11. Custom client for address resolution Tailored for concurrent handling ofmultiple outstanding requests Allows issuing of many resolution requeststogether• polling at a later time for completion of individual requests Facilitates load distribution among manyDNS servers.Mining the Web Chakrabarti and Ramakrishnan 11
  • 12. Caching server With a large cache, persistent across DNSrestarts Residing largely in memory if possible.Mining the Web Chakrabarti and Ramakrishnan 12
  • 13. Prefetching client• Steps1. Parse a page that has just been fetched2. extract host names from HREF targets3. Make DNS resolution requests to the caching server• Usually implemented using UDP• User Datagram Protocol• connectionless, packet-based communication protocol•does not guarantee packet delivery• Does not wait for resolution to becompleted.Mining the WebChakrabarti and Ramakrishnan 13
  • 14. Multiple concurrent fetches• Managing multiple concurrentconnections• A single download may take several seconds• Open many socket connections to different HTTP servers simultaneously• Multi-CPU machines not useful• crawling performance limited by network and disk• Two approaches1. using multi-threading2. using non-blocking sockets with eventMining the WebChakrabarti and Ramakrishnan 14
  • 15. Multi-threading• logical threads • physical thread of control provided by the operating system (E.g.: pthreads) OR • concurrent processes• fixed number of threads allocated in advance• programming paradigm • create a client socket • connect the socket to the HTTP service on a server • Send the HTTP request header • read the socket (recv) until•no more characters are available • close the socket.• use blocking system callsMining the Web Chakrabarti and Ramakrishnan 15
  • 16. Multi-threading: Problems• performance penalty • mutual exclusion • concurrent access to data structures• slow disk seeks. • great deal of interleaved, random input-outputon disk• Due to concurrent modification of documentrepository by multiple threadsMining the WebChakrabarti and Ramakrishnan 16
  • 17. Non-blocking sockets and eventhandlers• non-blocking sockets • connect, send or recv call returns immediately without waiting for the network operation to complete. • poll the status of the network operation separately• “select” system call • lets application suspend until more data can be read from or written to the socket•timing out after a pre-specified deadline•Monitor polls several sockets at the same time• More efficient memory management• code that completes processing not interrupted by other completions• No need for locks and semaphores on the poolMining the Web Chakrabarti and Ramakrishnan 17
  • 18. Link extraction and normalization• Goal: Obtaining a canonical form of URL• URL processing and filtering• Avoid multiple fetches of pages known bydifferent URLs• many IP addresses•For load balancing on large sites• Mirrored contents/contents on same file system•“Proxy pass“• Mapping of different host names to a single IP address• need to publish many logical sites• Relative URLs•need to be interpreted w.r.t to a base URL.Mining the WebChakrabarti and Ramakrishnan 18
  • 19. Canonical URL Formed by• Using a standard string for the protocol• Canonicalizing the host name• Adding an explicit port number• Normalizing and cleaning up the pathMining the Web Chakrabarti and Ramakrishnan 19
  • 20. Robot exclusion• Check• whether the server prohibits crawling anormalized URL• In robots.txt file in the HTTP root directory ofthe server•species a list of path prefixes which crawlers should not attempt to fetch.• Meant for crawlers onlyMining the Web Chakrabarti and Ramakrishnan 20
  • 21. Eliminating already-visited URLs Checking if a URL has already been fetched• Before adding a new URL to the work pool• Needs to be very quick.• Achieved by computing MD5 hash function on the URL Exploiting spatio-temporal locality of accessTwo-level hash function. – most significant bits (say, 24) derived by hashing the host name plus port – lower order bits (say, 40) derived by hashing the pathconcatenated bits use d as a key in a B-tree qualifying URLs added to frontier of the crawl. hash values added to B-tree.Mining the WebChakrabarti and Ramakrishnan 21
  • 22. Spider traps Protecting from crashing on• Ill-formed HTML E.g.: page with 68 kB of null characters• Misleading sites indefinite number of pages dynamically generatedby CGI scripts paths of arbitrary depth created using softdirectory links and path remapping features inHTTP serverMining the WebChakrabarti and Ramakrishnan 22
  • 23. Spider Traps: Solutions No automatic technique can be foolproof Check for URL length Guards• Preparing regular crawl statistics• Adding dominating sites to guard module• Disable crawling active content such as CGIform queries• Eliminate URLs with non-textual data typesMining the Web Chakrabarti and Ramakrishnan23
  • 24. Avoiding repeated expansion of links on duplicate pages Reduce redundancy in crawls Duplicate detection• Mirrored Web pages and sites Detecting exact duplicates• Checking against MD5 digests of stored URLs• Representing a relative link v(relativetoaliasesu1and u2)as tuples (h(u1);v) and (h(u2);v) Detecting near-duplicates• Even a single altered character will completely change the digest !E.g.: date of update/ name and email of the site administrator• Solution : Shingling and RamakrishnanMining the Web Chakrabarti24
  • 25. Load monitor Keeps track of various system statistics• Recent performance of the wide area network (WAN) connectionE.g.: latency and bandwidth estimates.• Operator-provided/estimated upper bound on open sockets for a crawler•Current number of active sockets.Mining the WebChakrabarti and Ramakrishnan 25
  • 26. Thread manager Responsible for Choosing units of work from frontier Scheduling issue of network resources Distribution of these requests over multiple ISPs if appropriate. Uses statistics from load monitorMining the Web Chakrabarti and Ramakrishnan26
  • 27. Per-server work queues Denial of service (DoS) attacks limit the speed or frequency of responses to any fixed client IP address Avoiding DOS limit the number of active requests to a givenserver IP address at any time maintain a queue of requests for each serverUse the HTTP/1.1 persistent socket capability. Distribute attention relatively evenly between a large number of sites Access locality vs. politeness dilemmaMining the WebChakrabarti and Ramakrishnan27
  • 28. Text repository Crawler’s last task  Dumping fetched pages into a repository Decoupling crawler from other functionsfor efficiency and reliability preferred Page-related information stored in twoparts  meta-data  page contents.Mining the Web Chakrabarti and Ramakrishnan 28
  • 29. Storage of page-related information Meta-data relational in natureusually managed by custom software to avoid relation database system overheadstext index involves bulk updates includes fields like content-type, last-modified date, content-length, HTTP status code, etc.Mining the Web Chakrabarti and Ramakrishnan29
  • 30. Page contents storage Typical HTML Web page compresses to 2-4 kB (using zlib) File systems have a 4-8 kB file block size Too large !! Page storage managed by custom storagemanager simple access methods forcrawler to add pagesSubsequent programs (Indexer etc) to retrieve documentsMining the WebChakrabarti and Ramakrishnan 30
  • 31. Page Storage Small-scale systems Repository fitting within the disks of a singlemachine Use of storage manager (E.g.: Berkeley DB)Manage disk-based databases within a single fileconfiguration as a hash-table/B-tree for URL access key To handle ordered access of pagesconfiguration as a sequential log of page records. Since Indexer can handle pages in any orderMining the Web Chakrabarti and Ramakrishnan 31
  • 32. Page Storage Large Scale systems Repository distributed over a number ofstorage servers Storage serversConnected to the crawler through a fast local network (E.g.: Ethernet)Hashed by URLs `T3 grade leased lines.To handle 10 million pages (40 GB) per hourMining the WebChakrabarti and Ramakrishnan 32
  • 33. Large-scale crawlers often use multiple ISPs and a bank of local storage servers to store the pages crawled.Mining the WebChakrabarti and Ramakrishnan 33
  • 34. Refreshing crawled pages Search engines index should be fresh Web-scale crawler never `completes its job High variance of rate of page changes “If-modified-since” request header withHTTP protocol Impractical for a crawler Solution At commencement of new crawling round estimate which pages have changedMining the WebChakrabarti and Ramakrishnan 34
  • 35. Determining page changes “Expires” HTTP response header  For page that come with an expiry date Otherwise need to guess if revisiting thatpage will yield a modified version.  Score reflecting probability of page beingmodified Crawler fetches URLs in decreasing order ofscore. Assumption : recent past predicts the futureMining the WebChakrabarti and Ramakrishnan 35
  • 36. Estimating page change rates Brewington and Cybenko & Cho Algorithms for maintaining a crawl in which most pages are fresher than a specified epoch. Prerequisite average interval at which crawler checks for changes is smaller than the inter-modification times of a page Small scale intermediate crawler runs to monitor fast changing sitesE.g.: current news, weather, etc. Patched intermediate indices into master indexMining the WebChakrabarti and Ramakrishnan 36
  • 37. Putting together a crawler Reference implementation of the HTTP client protocolWorld-wide Web Consortium (http://www.w3c.org/ )w3c-libwww packageMining the Web Chakrabarti and Ramakrishnan 37
  • 38. Design of the core components: Crawler class. To copy bytes from network sockets to storagemedia Three methods to express Crawlers contractwith user pushing a URL to be fetched to the Crawler(fetchPush) Termination callback handler (fetchDone) called withsame URL Method (start) which starts Crawlers event loop. Implementation of Crawler class  Need for two helper classes called DNS and FetchMining the WebChakrabarti and Ramakrishnan 38
  • Comments
    Top