Search

Main Menu

Login
Username:

Password:

Remember me?
Anonymous

Lost Password?

Register now!

Samurize Forums -> Pagescraper randomly quits?


Pages: (3) 1 [2] 3   ( Go to first unread post ) Reply to this topicStart new topicStart Poll

> Pagescraper randomly quits?, Anyone experience this?
uziq
Posted: Feb 25 2009, 01:32 PM
Quote Post
The Pluginator
*****



Group: Admin
Joined: Sep 25 2003
Posts: 2,599
Offline



You should be able to use a text editor to insert this line for each PageScraper meter:
QUOTE
PageScraper_SaveLogFile=1


But with so many meters, I don't know what will happen. I guess it's worth a shot though.
PMEmail PosterMSN
Top
stisev
Posted: Mar 29 2009, 12:01 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



Hi uziq,

I did what you told me - added the lines to all the entries. The file became really large, really fast. Anyways, I kept resetting Samurize to keep the file size now. The one I got is 80 or so MB. Here's the bottom page of the file before it quit.

CODE


09:22:59.150       0.01ms [15]  SCRAPEPAGE - entry
09:22:59.166       0.02ms [58]   READFILEFROMHTTP - getting an Internet handle via InternetOpen..
09:22:59.166       0.01ms [16] GETWEBPAGETHREAD - CALL DownloadPage..

09:22:59.166   60000.89ms [12] GETWEBPAGETHREAD - waiting done
09:22:59.166 ------------ [46] GetWebpage - entry
09:22:59.166       0.01ms [15]  SCRAPEPAGE - getting start/end positions on the page..
09:22:59.166 ------------ [46] GetWebpage - exit
09:22:59.166       0.00ms [12] GETWEBPAGETHREAD - CALL DownloadPage..

09:22:59.166       0.01ms [15]  SCRAPEPAGE - got page range: 1 to 33425
09:22:59.166       0.07ms [16]  DOWNLOADPAGE - CALL ReadFileFromHttp: http://quotes.ino.com/chart/?s=FOREX_USDJPY
09:22:59.166       0.01ms [12]  DOWNLOADPAGE - CALL ReadFileFromHttp: http://quotes.ino.com/chart/?s=FOREX_USDPHP
09:22:59.166       0.01ms [15]  SCRAPEPAGE - >>>>> finding Match 1 [Match1] ...
09:22:59.166       0.02ms [12]   READFILEFROMHTTP - getting an Internet handle via InternetOpen..
09:22:59.166       0.04ms [12]   READFILEFROMHTTP - InternetOpen SUCCESS

09:22:59.166       0.01ms [12]   READFILEFROMHTTP - getting a URL handle via InternetOpenUrl..
09:22:59.166       0.38ms [15]  SCRAPEPAGE - [Match1] - MatchResultAfterFilters(1, 1) LEN = 6
09:22:59.166       0.08ms [15]  SCRAPEPAGE - >>> DONE FINDING MATCHES. MostOccurrencesFound = 1

09:22:59.166 ------------ [47] GetWebpage - entry
09:22:59.166       0.00ms [15]  SCRAPEPAGE - combining each match's occurrences into a single string..
09:22:59.166 ------------ [47] GetWebpage - exit
09:22:59.166       0.02ms [16]   READFILEFROMHTTP - getting an Internet handle via InternetOpen..
09:22:59.166       0.02ms [16]   READFILEFROMHTTP - InternetOpen SUCCESS

09:22:59.166       0.01ms [16]   READFILEFROMHTTP - getting a URL handle via InternetOpenUrl..
09:22:59.166 ------------ [48] GetWebpage - entry
09:22:59.166 ------------ [48] GetWebpage - exit
09:22:59.166 ------------ [49] GetWebpage - entry
09:22:59.166 ------------ [49] GetWebpage - exit
09:22:59.166       0.03ms [58]   READFILEFROMHTTP - InternetOpen SUCCESS

09:22:59.166       0.01ms [58]   READFILEFROMHTTP - getting a URL handle via InternetOpenUrl..
09:22:59.166      81.87ms [26]    INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_RESPONSE_RECEIVED. LEN=3406
09:22:59.166   60000.26ms [3] GETWEBPAGETHREAD - waiting done
09:22:59.166       0.01ms [3] GETWEBPAGETHREAD - CALL DownloadPage..

09:22:59.166       0.02ms [3]  DOWNLOADPAGE - CALL ReadFileFromHttp: http://quotes.ino.com/chart/?s=FOREX_XPDUSDO
09:22:59.166       0.02ms [3]   READFILEFROMHTTP - getting an Internet handle via InternetOpen..
09:22:59.166       0.13ms [3]   READFILEFROMHTTP - InternetOpen SUCCESS

09:22:59.166       0.01ms [3]   READFILEFROMHTTP - getting a URL handle via InternetOpenUrl..
09:22:59.166       0.15ms [3]    INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_HANDLE_CREATED
09:22:59.166       2.91ms [26]   READFILEFROMHTTP - getting url contents via InternetReadFile..
09:22:59.181       9.28ms [3]    INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_CONNECTING_TO_SERVER. Socket Address=825112118
09:22:59.181      17.34ms [12]    INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_HANDLE_CREATED
09:22:59.181      21.09ms [15]  SCRAPEPAGE - combining done
09:22:59.181       0.01ms [15]  SCRAPEPAGE - CALL BuildOutputString..
09:22:59.181 ------------ [15] BUILDOUTPUTSTRING - entry
09:22:59.181 ------------ [15] BUILDOUTPUTSTRING - looping thru occurrences 1 to 1 ...
09:22:59.181 ------------ [15] BUILDOUTPUTSTRING - occurrence 1...
09:22:59.181 ------------ [15] BUILDOUTPUTSTRING - exit
09:22:59.181       1.12ms [15]  SCRAPEPAGE - returned from BuildOutputString. 6 bytes
09:22:59.181       0.05ms [15]  SCRAPEPAGE - final output string length is 6 bytes
09:22:59.181       0.01ms [15]  SCRAPEPAGE - exit

09:22:59.181       0.05ms [15] GETWEBPAGETHREAD - returned from ScrapePage.
09:22:59.181       0.01ms [15] GETWEBPAGETHREAD - waiting for 60 seconds..



I don't see anything wrong here. Samurize just quit out of nowhere again like usual and at random.
PM
Top
uziq
Posted: Mar 29 2009, 08:59 PM
Quote Post
The Pluginator
*****



Group: Admin
Joined: Sep 25 2003
Posts: 2,599
Offline



I may have spotted a potential problem. Disable the log file and try this update. If it still crashes, enable the log and post it again as you did before.

This post has been edited by uziq on Jul 5 2009, 06:14 PM
PMEmail PosterMSN
Top
stisev
Posted: Mar 29 2009, 10:48 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



Hi uziq,

I just wanted to say that I just saw your post and am trying out your new plugin now. I made two configs -

stisev_debug.ini
stisev.ini

I went back to stisev.ini and am testing out your updated plugin.

Two questions:

1) May I ask where in the log the problem you found was?
2) Why does Pagescraper take such CPU time? I realize this is a really stupid and ridiculous question, but it takes about 3-5% of CPU non-stop (each second). I guess my thought is that if all of my configs are configured to scrape every 15minutes, why does it take a continuous amount of CPU. It should scrape (take a lot of CPU at once) and then not do anything for a long time, right?
PM
Top
stisev
Posted: Mar 29 2009, 10:50 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



UPDATE: lol, a couple seconds after I posted this, it crashed smil3dbd4d75edb5e.gif

I'm back to debug config now.
PM
Top
uziq
Posted: Mar 30 2009, 12:22 PM
Quote Post
The Pluginator
*****



Group: Admin
Joined: Sep 25 2003
Posts: 2,599
Offline



I guess I'll have to mess with the code a bit more.

1. A few meters called InternetOpen or InternetOpenUrl just before the crash.
2. How do you know it's PageScraper and not Samurize? Once PageScraper has downloaded/scraped/filtered/output data, it enters a wait state and uses 0% CPU until the next refresh (determined in Adv Options).

PMEmail PosterMSN
Top
stisev
Posted: Mar 30 2009, 03:43 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



uziq,
Crashed again. The log is 50MB in size (pretty small IMO). I'm going to upload it now and provide you a link. (see below)

1. Ah, thank you.
2. I think you're referring to the CPU % question I posted earlier (and that you're not asking me why I think it's PageScrapper crashing and not Samurize. I answered both in a & b.
a) When I remove the pagescraper config and leave everything else (time and the system temp stuff), it doesn't take nearly the CPU time -- about 1.5% every few 7 seconds or so! OR when I remove the pagescaper plugin but keep my current config, the CPU thing disappears
b) I'm -fairly- certain it's pagescraper crashing, since I ran multiple tests to confirm what was crashing and found it always crashed when Pagescraper ran, but never did without it.



New question
4)



Here's the end of the code for the NEW Pagescrapper_crash2_log.txt

CODE

08:46:23.056   59994.53ms [30] GETWEBPAGETHREAD - waiting done
08:46:23.056       0.01ms [30] GETWEBPAGETHREAD - CALL DownloadPage..

08:46:23.056       0.02ms [30] DOWNLOADPAGE - CALL ReadFileFromHttp: http://www.bloomberg.com/apps/quote?ticker=SHCOMP%3AIND
08:46:23.056       0.02ms [30] READFILEFROMHTTP - getting an Internet handle via InternetOpen..
08:46:23.056       0.04ms [30] READFILEFROMHTTP - InternetOpen SUCCESS

08:46:23.056       0.01ms [30] READFILEFROMHTTP - getting a URL handle via InternetOpenUrl..
08:46:23.056       0.01ms [30]  INTERNETSTATUS_TURNSTRINTOINDEX - UBOUND(gInternetStatusCallbackStr) = 2670
08:46:23.056       0.13ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_HANDLE_CREATED
08:46:23.056       0.21ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_COOKIE_SENT. Amount sent=2
08:46:23.056       0.14ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_CONNECTING_TO_SERVER. Socket Address=775172146
08:46:23.150      90.56ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_CONNECTED_TO_SERVER. Socket Address=775172146
08:46:23.150       0.04ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_SENDING_REQUEST
08:46:23.150       0.04ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_REQUEST_SENT
08:46:23.150       0.02ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_RECEIVING_RESPONSE
08:46:23.228      92.32ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_RESPONSE_RECEIVED. LEN=468
08:46:23.244       0.06ms [30] READFILEFROMHTTP - InternetOpenUrl SUCCESS

08:46:23.244       0.01ms [30] READFILEFROMHTTP - getting headers via HTTPQueryInfo..
08:46:23.244       0.02ms [30] READFILEFROMHTTP - HTTPQueryInfo SUCCESS. Header = HTTP/1.1 500 Server Error%bServer: Sun-ONE-Web-Server/6.1%bDate: Mon, 30 Mar 2009 15:46:33 GMT%bContent-length: 305%bContent-type: text/html%bConnection: close%b%b
08:46:23.244       0.02ms [30] READFILEFROMHTTP - getting url contents via InternetReadFile..
08:46:23.244       0.03ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_CLOSING_CONNECTION
08:46:23.244       0.10ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_CONNECTION_CLOSED
08:46:23.244       1.48ms [30] READFILEFROMHTTP - getting url contents via InternetReadFile..
08:46:23.244       0.02ms [30] READFILEFROMHTTP - InternetReadFile SUCCESS (in 1 passes)

08:46:23.244       0.00ms [30] READFILEFROMHTTP - Going to close open internet handles..
08:46:23.244       0.09ms [30]  INTERNETSTATUSCALLBACK - [InternetOpenUrl] INTERNET_STATUS_HANDLE_CLOSING
08:46:23.244       0.03ms [30] READFILEFROMHTTP - Done closing handles.

08:46:23.244       0.01ms [30] READFILEFROMHTTP - exit
08:46:23.244       0.14ms [30] DOWNLOADPAGE - returned from ReadFileFromHttp. First line of header: HTTP/1.1 500 Server Error
08:46:23.244       0.04ms [30] DOWNLOADPAGE - 468 bytes read
08:46:23.244       0.00ms [30] DOWNLOADPAGE - CALL DecodeUTF8...
08:46:23.244       0.00ms [30] DECODEUTF8 - entry
08:46:23.244       0.01ms [30] DECODEUTF8 - Header length: 159
HTTP/1.1 500 Server Error
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 30 Mar 2009 15:46:33 GMT
Content-length: 305
Content-type: text/html
Connection: close
08:46:23.244       0.02ms [30] DECODEUTF8 - Content-Type header found.. text/html
08:46:23.244       0.01ms [30] DECODEUTF8 - content is NOT UTF-8 encoded.
08:46:23.244       0.01ms [30] DECODEUTF8 - exit
08:46:23.244       0.01ms [30] DOWNLOADPAGE - returned from DecodeUTF8.
08:46:23.244       0.03ms [30] DOWNLOADPAGE - RefreshRateSeconds adjusted from 60 to 60
08:46:23.244       0.07ms [30] GETWEBPAGETHREAD - returned from DownloadPage
PM
Top
stisev
Posted: Mar 30 2009, 03:51 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



uziq:
Here's multiple links to the (same) file = PageScraper_Log_crash2_log.txt

LINK: http://rapidshare.com/files/215485582/Page...h2_log.rar.html

LINK TO ENTIRE SAMURIZE FOLDER (virus/trojan free, confirmed) .RARed +passworded (same password as other file).

http://rapidshare.com/files/215489705/Samurize.rar.html

MD5: F279FB6DECD6333C05901E2C8BAEF082

Password has been PMed to you.

.TXT log is 49.1 MB (51,485,735 bytes)
.RAR-ed fle is 2.32 MB (2,433,024 bytes)

Only 2.32MB to download smil3dbd4d6422f04.gif. See your PM box for password.

This post has been edited by stisev on Mar 30 2009, 04:06 PM
PM
Top
stisev
Posted: Mar 30 2009, 04:05 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



uziq,
I was wrong about one thing.

I removed ALL of the plugins, but kept the config the way it is, and Samurize still takes 3-5% CPU. Strange. Very strange. I'm going to tinker with the CPU stuff before reporting back to you, but my most major concern is the crashing smil3dbd4d75edb5e.gif

PM
Top
stisev
Posted: Mar 30 2009, 04:06 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



Regarding the CPU taking time, I changed ALL the entries to either scrape every 10 minutes or 100++ minutes and I still get the CPU 3-5%.

The pagescraping log is showing a TON of this. Please see the log for details. The log is flooded with this stuff and I believe this is what is causing the CPU to be used. I don't understand why it is there though. Just as you said, once it's scraped and Pagescraper returns the value, it should just wait like a good little plugin (hehe) for whatever value it's been set at (in this case 10minutes)

CODE

15:06:30.041 ------------ [62] GetWebpage - entry
15:06:30.041 ------------ [62] GetWebpage - exit
15:06:30.041 ------------ [63] GetWebpage - entry
15:06:30.041 ------------ [63] GetWebpage - exit
15:06:30.041 ------------ [64] GetWebpage - entry
15:06:30.041 ------------ [64] GetWebpage - exit
15:06:30.041 ------------ [65] GetWebpage - entry
15:06:30.041 ------------ [65] GetWebpage - exit
15:06:30.041 ------------ [66] GetWebpage - entry
15:06:30.041 ------------ [66] GetWebpage - exit
15:06:30.056 ------------ [67] GetWebpage - entry
15:06:30.056 ------------ [67] GetWebpage - exit
15:06:30.056 ------------ [68] GetWebpage - entry
15:06:30.056 ------------ [68] GetWebpage - exit
15:06:30.056 ------------ [69] GetWebpage - entry
15:06:30.056 ------------ [69] GetWebpage - exit


This post has been edited by stisev on Mar 30 2009, 05:08 PM
PM
Top
stisev
Posted: Mar 30 2009, 05:38 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



Hi uziq,
Sorry to take up so much of your time, but here's a log of my Samurize config for 9 minutes.

Technically, it should do all it should be doing at the beginning (scraping on load) and then waiting for 10 minutes, correct?

Well, here's my Samurize log for 9 minutes (precisely). As you can see, Pagescraper is doing a bunch of stuff after initial scrape.

CODE
15:23:27.744 ------------ [9] BUILDOUTPUTSTRING - looping thru occurrences 1 to 1 ...
15:23:27.744 ------------ [9] BUILDOUTPUTSTRING - occurrence 1...
15:23:27.744 ------------ [9] BUILDOUTPUTSTRING - exit
15:23:27.744       0.92ms [9]  SCRAPEPAGE - returned from BuildOutputString. 7 bytes
15:23:27.744       0.04ms [9]  SCRAPEPAGE - final output string length is 7 bytes
15:23:27.744       0.00ms [9]  SCRAPEPAGE - exit

15:23:27.759       0.04ms [9] GETWEBPAGETHREAD - returned from ScrapePage.
15:23:27.759       0.07ms [9] GETWEBPAGETHREAD - waiting for 6110 seconds..
15:23:28.072 ------------ [1] GetWebpage - entry
15:23:28.072 ------------ [1] GetWebpage - exit
15:23:28.072 ------------ [2] GetWebpage - entry
15:23:28.072 ------------ [2] GetWebpage - exit
15:23:28.072 ------------ [3] GetWebpage - entry
15:23:28.072 ------------ [3] GetWebpage - exit
15:23:28.072 ------------ [4] GetWebpage - entry
15:23:28.072 ------------ [4] GetWebpage - exit
15:23:28.072 ------------ [5] GetWebpage - entry
15:23:28.072 ------------ [5] GetWebpage - exit
15:23:28.072 ------------ [6] GetWebpage - entry
15:23:28.072 ------------ [6] GetWebpage - exit
15:23:28.072 ------------ [7] GetWebpage - entry
15:23:28.072 ------------ [7] GetWebpage - exit
15:23:28.088 ------------ [8] GetWebpage - entry
15:23:28.088 ------------ [8] GetWebpage - exit
15:23:28.088 ------------ [9] GetWebpage - entry
15:23:28.088 ------------ [9] GetWebpage - exit
15:23:28.088 ------------ [10] GetWebpage - entry
15:23:28.088 ------------ [10] GetWebpage - exit
15:23:28.088 ------------ [11] GetWebpage - entry
15:23:28.088 ------------ [11] GetWebpage - exit
15:23:28.088 ------------ [12] GetWebpage - entry
15:23:28.088 ------------ [12] GetWebpage - exit



After 15:23:27.759, it goes into 600 second (10min) wait mode, BUT Pagescraper keeps sending these "GetWebpage - exit" and "GetWebpage - entry" signals until period 15:33:25.056 where one entry starts before the others and the rest start at 15:33:34.181.

****This is the cause for the 3-6% CPU use that I'm associating with Pagescraper.****

Here is the FULL 10 minute log from beginning of 1 10minute cycle to the end and to the beginning of another.

The file is named "Pagescraper_TIME_TEST_log.txt" (RARed)
There is no password protection on this file.
http://rapidshare.com/files/215518425/Page...ST_log.rar.html
PM
Top
uziq
Posted: Mar 30 2009, 05:57 PM
Quote Post
The Pluginator
*****



Group: Admin
Joined: Sep 25 2003
Posts: 2,599
Offline



I'm guessing the 'high' CPU is due to so many meters saving to the log file. I'd expect the 'non-debug' config to use less CPU.

I don't see any obvious problem in those log files (except for INTERNETSTATUS_TURNSTRINTOINDEX, which I'll need to work on).

The GetWebpage entry..exit pairs are expected since the refresh rate of your meters is 1000ms. GetWebpage simply returns the most recently scraped data that PageScraper's worker thread produced. This thread downloads/scrapes/etc and then sleeps for the time specified in Adv Options.
PMEmail PosterMSN
Top
stisev
Posted: Mar 30 2009, 06:31 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



QUOTE
I'm guessing the 'high' CPU is due to so many meters saving to the log file. I'd expect the 'non-debug' config to use less CPU.


uziq,
No, the debug version takes only -slightly- more. The regular non-debug takes about the same amount of CPU which is 3-5% every second it's open. That is what's so baffling for me. I don't understand what pagescraper (note that this CPU taking is NOT there when pagescraper is not enabled, but a curious thing is that it does still happen even if pagescraper config is present, but the plugin is not. :-\

PM
Top
uziq
Posted: Mar 30 2009, 06:39 PM
Quote Post
The Pluginator
*****



Group: Admin
Joined: Sep 25 2003
Posts: 2,599
Offline



QUOTE (stisev @ Mar 30 2009, 06:31 PM)
but a curious thing is that it does still happen even if pagescraper config is present, but the plugin is not. :-\

Then Samurize itself must be responsible for the CPU usage.

Don't forget that it will always cost something to update and draw the config to the screen. The more meters there are, or the larger the config, the higher the CPU will be.
PMEmail PosterMSN
Top
stisev
Posted: Mar 30 2009, 06:54 PM
Quote Post
Shadow
*



Group: Member
Joined: Jan 24 2005
Posts: 45
Offline



Hang on... lol this is bizarre

This post has been edited by stisev on Mar 30 2009, 06:56 PM
PM
Top
0 User(s) are reading this topic (0 Guests and 0 Anonymous Users)
0 Members:

Topic Options Pages: (3) 1 [2] 3  Reply to this topicStart new topicStart Poll

 

Come On, Get Samurized!