I am planning to build a search engine which works like GOOGLE. Can anybody
send me a sample database ..
regards LaraHi
Have a look at http://www.databaseanswers.org/data_models/index.htm
or ask Google for their trade secrets.
Regards
--
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
IM: mike@.epprecht.net
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Lara" <aneeshattingal@.hotpop.com> wrote in message
news:%23eWNoOXTFHA.2560@.TK2MSFTNGP09.phx.gbl...
> Hi,
> I am planning to build a search engine which works like GOOGLE. Can
> anybody
> send me a sample database ..
> regards Lara
>|||g*d d*mn it why don't you search google|||I have never seen Google's infrastructure, but I doubt it resembles:
http://www.databaseanswers.org/data...rch_engine.htm.
Either way, to the original question, if any of us had designed a database
to rival Googles, or even to work like theirs, don't you think we would be
out driving in our shiny new Jaguars! Seriously, this is quite a question
:)
----
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"Mike Epprecht (SQL MVP)" <mike@.epprecht.net> wrote in message
news:ON9VcdZTFHA.612@.TK2MSFTNGP12.phx.gbl...
> Hi
> Have a look at http://www.databaseanswers.org/data_models/index.htm
> or ask Google for their trade secrets.
> Regards
> --
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
> IM: mike@.epprecht.net
> MVP Program: http://www.microsoft.com/mvp
> Blog: http://www.msmvps.com/epprecht/
> "Lara" <aneeshattingal@.hotpop.com> wrote in message
> news:%23eWNoOXTFHA.2560@.TK2MSFTNGP09.phx.gbl...
>|||You might be surprised. All their spiders do is scan web pages and extract
keywords which they slap into a database structure. The complicated piece
is their spidering logic on the application side, which has to parse HTML
tags, ignore certain words, calculate "weights" for various pages, follow
links from one page to the next, calculate references to a page, etc. I
wouldn't be surprised at all if their database structure was based on
something very simple like this, with some additional columns and/or tables
to store word occurence counts, link counts and other page ranking
information.
Like you said though, too bad we didn't think of it first! :)
"Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
news:u2CbUvoTFHA.3244@.TK2MSFTNGP15.phx.gbl...
>I have never seen Google's infrastructure, but I doubt it resembles:
>http://www.databaseanswers.org/data...rch_engine.htm.
> Either way, to the original question, if any of us had designed a database
> to rival Googles, or even to work like theirs, don't you think we would be
> out driving in our shiny new Jaguars! Seriously, this is quite a question
> :)
> --
> ----
--
> Louis Davidson - http://spaces.msn.com/members/drsql/
> SQL Server MVP
> "Mike Epprecht (SQL MVP)" <mike@.epprecht.net> wrote in message
> news:ON9VcdZTFHA.612@.TK2MSFTNGP12.phx.gbl...
>|||Michael, is on the right track, but Google's Architecture (see
http://www.computer.org/micro/mi2003/m2022.pdf) and search algorithm
(PageRank - see http://www.voelspriet2.nl/PageRank.pdf) is far more complex.
They do not use RDBMS (except some MySQL for minor internal support apps)
and they do not use Windows servers and the 15,000 clustered servers in the
first article, is now well over 100,000 and growing... Interesting, many
people had thought of and had done search engines, just not with their
patented PageRank algorithm and of course having your own OS and file system
and 100,000 servers and many of the top PhD's doesn't hurt as well :-).
However, even with the above said, there are still opportunities in search
that they do not handle well, i.e., Enterprise Search (or intranet search)
with combining structured search (RDBMS) and unstructured search (spiders &
html) along with local search (or desktop search) seems to me to be an
interesting possibility at this time...
Regards,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Michael C#" <xyz@.abcdef.com> wrote in message
news:hjbde.18763$V02.5438@.fe08.lga...
> You might be surprised. All their spiders do is scan web pages and
extract
> keywords which they slap into a database structure. The complicated piece
> is their spidering logic on the application side, which has to parse HTML
> tags, ignore certain words, calculate "weights" for various pages, follow
> links from one page to the next, calculate references to a page, etc. I
> wouldn't be surprised at all if their database structure was based on
> something very simple like this, with some additional columns and/or
tables
> to store word occurence counts, link counts and other page ranking
> information.
> Like you said though, too bad we didn't think of it first! :)
> "Louis Davidson" <dr_dontspamme_sql@.hotmail.com> wrote in message
> news:u2CbUvoTFHA.3244@.TK2MSFTNGP15.phx.gbl...
database
be
question
> ----
--
>|||Finally an intellignet answer to an unanswerable question... just kidding al
l.
Great response(s) though !
"Candor Feg" wrote:
> g*d d*mn it why don't you search google
>
>|||Thanks for the link - now THAT is interesting!
"John Kane" <jt-kane@.comcast.net> wrote in message
news:eZ4RkVzTFHA.1152@.tk2msftngp13.phx.gbl...
> Michael, is on the right track, but Google's Architecture (see
> http://www.computer.org/micro/mi2003/m2022.pdf) and search algorithm
> (PageRank - see http://www.voelspriet2.nl/PageRank.pdf) is far more
> complex.
> They do not use RDBMS (except some MySQL for minor internal support apps)
> and they do not use Windows servers and the 15,000 clustered servers in
> the
> first article, is now well over 100,000 and growing... Interesting, many
> people had thought of and had done search engines, just not with their
> patented PageRank algorithm and of course having your own OS and file
> system
> and 100,000 servers and many of the top PhD's doesn't hurt as well :-).
> However, even with the above said, there are still opportunities in search
> that they do not handle well, i.e., Enterprise Search (or intranet search)
> with combining structured search (RDBMS) and unstructured search (spiders
> &
> html) along with local search (or desktop search) seems to me to be an
> interesting possibility at this time...
> Regards,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "Michael C#" <xyz@.abcdef.com> wrote in message
> news:hjbde.18763$V02.5438@.fe08.lga...
> extract
> tables
> database
> be
> question
> --
>|||Michael,
Just curious, which link did you find interesting? (I have many others, from
my book research on this subject.)
What do you think of "Enterprise Search" or Intranet search?
Thanks,
John
--
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
"Michael C#" <xyz@.abcdef.com> wrote in message
news:S0Bde.22739$RP1.19555@.fe10.lga...
> Thanks for the link - now THAT is interesting!
> "John Kane" <jt-kane@.comcast.net> wrote in message
> news:eZ4RkVzTFHA.1152@.tk2msftngp13.phx.gbl...
apps)
search
search)
(spiders
HTML
follow
I
would
>
-
>|||I found the architecture link the most interesting, although the Spidering
search was very informative as well. I found it very interesting how they
set up and configured their servers, as well as how they actually store
data. There definitely could be some lessons to be learned in there...
(now if I can just get my boss to hire a couple hundred developers we're on
our way! :) ) I'm particularly surprised that they don't use an RDBMS? I'm
still a little unclear on the method they use to store their data -- other
than physically separating the indexes from the data, which is pretty common
practice in SQL Server to improve performance. Do you have any more links
describing their architecture in greater detail, or do I have to wait for
the book? <g>
I had known a little bit about the page ranking formula they used, but
mostly just broad generalities. It was

and formulas, and how they relate to the overall search engine scheme.
I like the enterprise search idea, but I can imagine a decent enterprise
spider and search system would require some *serious* resources to
implement. It also brings up certain aspects of privacy and security, since
it would logically - if not physically - centralize access to corporate
information. After all, they might not want Joe Schmoe in the mailroom
spidering accounting and payroll information for the top execs in the
corporation -- and they definitely don't want John Schmoe in the
competitor's mailroom pulling up corporate docs at whim.
"John Kane" <jt-kane@.comcast.net> wrote in message
news:elFuBbAUFHA.3352@.TK2MSFTNGP12.phx.gbl...
> Michael,
> Just curious, which link did you find interesting? (I have many others,
> from
> my book research on this subject.)
> What do you think of "Enterprise Search" or Intranet search?
> Thanks,
> John
> --
> SQL Full Text Search Blog
> http://spaces.msn.com/members/jtkane/
>
> "Michael C#" <xyz@.abcdef.com> wrote in message
> news:S0Bde.22739$RP1.19555@.fe10.lga...
> apps)
> search
> search)
> (spiders
> HTML
> follow
> I
> would
> -
>
No comments:
Post a Comment