01-28-2022, 02:47 AM
(This post was last modified: 01-28-2022, 02:57 AM by Leatherneck.)
Hello: I'm having difficulty getting an engine for google scholar to function. I can use my query string in the browser but when I test the engine I get a page full of html. I have reviewed the page source and have identified the before and after strings but it doesn't seem to work.
here is the query string
https://scholar.google.com/scholar?start...s_sdt=0,44
if I replace the {PAGENUM} and {KEYWORD} with values and paste the string in the browser I get good results
Also here is the rest of the information for the engine
[Engine1]
Displayname=Google Scholar
QueryString=https://scholar.google.com/scholar?start={PAGENUM}&q={KEYWORD}&hl=en&as_sdt=0,44
MustBeInLink=
MustNotBeInLink=scholar?q=related:_
JustBeforeLink=" href="
RightAfterLink=" data-clk=
PageStart=0
PageInc=1
NextPageMarker= <b style="display:block;margin-left:53px">Next</b>
Translation=%3f=?|%3d==|%26=&|%3a=:|&=&|%2F=/
headerData=
Referer=0
SSLVersion=1
Favicon=
GracePeriod=0
UserAgent=
Selected=1
DetailedSelected=0
FollowRelocation=0
AddRelative=
AddFieldValue=
ClearCookies=0
ListMode=0
ReadOnly=0
One thing I see but do not know how to deal with is the JustBeforeLink. The source HTML looks like this: <a id="_Om2lsUOWQEJ" href="http......
I think in order for scrapebox to recognize the URL, the string containing the d="_Om2lsUOWQEJ" needs to be converted to a wildcard because each found URL has a unique id.
I appreciate it if someone can help me finish this. I think I'm close, however; ignorance is bliss!
Leatherneck
here is the query string
https://scholar.google.com/scholar?start...s_sdt=0,44
if I replace the {PAGENUM} and {KEYWORD} with values and paste the string in the browser I get good results
Also here is the rest of the information for the engine
[Engine1]
Displayname=Google Scholar
QueryString=https://scholar.google.com/scholar?start={PAGENUM}&q={KEYWORD}&hl=en&as_sdt=0,44
MustBeInLink=
MustNotBeInLink=scholar?q=related:_
JustBeforeLink=" href="
RightAfterLink=" data-clk=
PageStart=0
PageInc=1
NextPageMarker= <b style="display:block;margin-left:53px">Next</b>
Translation=%3f=?|%3d==|%26=&|%3a=:|&=&|%2F=/
headerData=
Referer=0
SSLVersion=1
Favicon=
GracePeriod=0
UserAgent=
Selected=1
DetailedSelected=0
FollowRelocation=0
AddRelative=
AddFieldValue=
ClearCookies=0
ListMode=0
ReadOnly=0
One thing I see but do not know how to deal with is the JustBeforeLink. The source HTML looks like this: <a id="_Om2lsUOWQEJ" href="http......
I think in order for scrapebox to recognize the URL, the string containing the d="_Om2lsUOWQEJ" needs to be converted to a wildcard because each found URL has a unique id.
I appreciate it if someone can help me finish this. I think I'm close, however; ignorance is bliss!
Leatherneck