For all of my runs, it only shows the first page, although the log shows that it obeys and follows the follow link rule.Īny help would be greatly appreciated! I am getting really frustrated with this. I also tried to follow vancouver blog spotĪnd duplicate the result. Which states that the last file is stored. All special resources are available in all maps. The Kickstarter offered this content to those who purchased the Collectors Edition (Base game and DLC) for CA20 (USD15.5). We could really use your support to process with further development so please check out our campaign. This is contrary to simafore instruction and example The Special Resources DLC was released on the same days as the base game for USD3.99. We are excited to announce that we have just started a Kickstarter campaign for 112 Operator The development is going well, however we always aim for improvement as the process of game creation is long and complicated. Output directory: C:\Program Files\Rapid-I\myfiles\webcrawl Does this mean that it just checks a few pages, or just that it won't notify me for every discarded page? I see a lot of links in the log preceded with ("Following link.") but very few preceded with ("Discarded page."). (?i)http.*://An example URL that would need to be stored is:ġ1:50:38 AM INFO: Discarded page " " because url does not match filter rules.Īs you can see, it follows through with the process and just skips these links, and it doesn't even say that it doesn't match the filter rules so it's been discarded, so I'm not even sure that in these cases the program compares the links to the rules. They look like the following, when trying to crawl through Kickstarter's sites: 112 Operator - sequel to 911 Operator by Games Operators Kickstarter 112 Operator - sequel to 911 Operator This time with bigger maps, more realism, new calls, features and much more ADD TO YOUR WISHLIST Created by Games Operators 1,773 backers pledged CA 52,792 to help bring this project to life. It is probable that the problem is with my storing rules. I have tried experimenting with page size, depth and the like, but still the program just skips those sites. The problem is that the web crawler does crawl through the requested sites, but doesn't store them. I have already built a working text analyser, but I'm stuck at the web crawling part. I'm working on a web crawling project to analyse various crowdfunding sites' projects via text mining in Rapidminer 5.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |