There is a website called https://www.guidgenerator.com/online-guid-generator.aspx which generates globally unique identifiers. I'm trying to use perl's Mechanize to publish to a website to extract the guide. I realize this is javascript based but was wondering if I could make the right post to pull the numbers. I traced it from the browser and got all the headers in the request but the html returned does not contain the guid.
This is the result of a successful run:
<textarea name="txtResults" rows="2" cols="20" id="txtResults" style="font-family:Courier New,Courier,monospace;font-size:Larger;font-weight: bold;Height: 152px; Width: 421px;">qk5DF22bhkm4C2AwZ5OcZw==</textarea>
This is my script:
<textarea name="txtResults" rows="2" cols="20" id="txtResults" style="font-family:Courier New,Courier,monospace;font-size:Larger;font-weight: bold;Height: 152px; Width: 421px;"></textarea>
This is the form within the page:
In my script I dumped the following required form and input fields:
my @forms = $mech->forms; foreach my $form (@forms) { my @inputfields = $form->param; print Dumper \@inputfields; }
result
$VAR1 = [ '__EVENTTARGET', '__EVENTARGUMENT', '__LASTFOCUS', '__VIEWSTATE', '__VIEWSTATEGENERATOR', '__EVENTVALIDATION', 'txtCount', 'chkUppercase', 'chkBrackets', 'chkHypens', 'chkBase64', 'chkRFC7515', 'chkURL', 'LocalTimestampValue', 'btnGenerate', 'txtResults' ];
This is the post
my $mainpage = "https://www.guidgenerator.com/online-guid-generator.aspx"; $mech->post( "$mainpage", fields => { 'txtCount' => "1", 'chkBase64' => "on", 'LocalTimestampValue' => "Date%28%29.getTime%28%29", 'btnGenerate' => "Generate+some+GUIDs%21", 'txtResults' => "", '__EVENTTARGET' => 'on', '__EVENTARGUMENT', => 'on', '__LASTFOCUS', => 'on', '__VIEWSTATEGENERATOR' => "247C709F", '__VIEWSTATE' => 'on', '__EVENTVALIDATION' => 'on', 'chkUppercase' => 'off', 'chkBrackets' => 'off', 'chkHypens' => 'off', 'chkRFC7515' => 'off', 'chkURL' => 'off', }, );
When I trace on the website, I get the headers, but there is another tab called "Payload". This contains most of the fields listed above. I tried entering the fields into a POST but not sure if I should do this differently or it doesn't matter since it's javascript?
I know this is a lot of information. I'm not even sure Perl's mechanization can extract this information. Any help would be greatly appreciated. Please let me know any other data you'd like me to post here.
You can use Mech's built-in functionality to do this. No need to submit any additional fields or headers.
This will output the following:
The key here is that you cannot use
$mech- >submit
as this will not submit the value of the submit button. This is a bit annoying. So you have to use$mech->click
, which pretends that the default submit button of the default form is clicked, so the value is also submitted. This is how buttons work on a form, in this case the backend checks the values to see which one was clicked.You can then use
$mech->value
to get the field value. You may want tosplit
to process it further.The JavaScript in this page is actually completely unrelated to functionality. All it does is save and restore the settings you selected in the cookie so that when you come back, the same checkboxes will be checked. This is fine, but it might be better to use local storage on the frontend for now. However, you don't need to deal with JS at all to scrape this page. The main functionality is the backend.
You may also be interested in
$mech->dump_forms
, which is a great debugging aid that prints out all forms with fields and values. Another great debugging aid when using Mech (or any LWP-based class) is LWP::ConsoleLogger::Everywhere. This is what I use to compare the program's requests with the browser's requests to find the missing button form fields.Disclaimer: I am the maintainer of WWW::Mechanize and I wrote LWP::ConsoleLogger::Everywhere.