locked
Search Engine Creation and Optimisation using Microsoft Technologies RRS feed

  • Question

  •  

     

    Search engines today have become the life blood of the Internet. No user can function properly today without the use of these marvellous pieces of AI and other algorithm implementations.

     

    Let us come together and pool our efforts towards the creation of search engines or discuss about their architectures and algorithms.

    Sunday, May 20, 2007 2:15 PM

Answers

  • resources are nice.
    Monday, May 21, 2007 5:04 AM
  • You can try this shareware to create a basic search engine for ur HTML pages:

    http://www.searchmakerpro.com/download.html

    and then implement the algorithms in ASP, Servlets or JSP.
    Wednesday, May 23, 2007 6:51 AM

  • Hey Friends:

    Check out this forum
    http://forums.searchenginewatch.com/

    While u r at it check out the posts on:
    http://forums.microsoft.com/SamVaad/ShowPost.aspx?PostID=1562853&SiteID=43
    Thursday, May 24, 2007 9:01 AM
  • The Search Engine module will search an entire page also dynamic pages for matching keyword(s) or a phrase and will count how many times the keyword(s) or phrase are found on the page, and display the results with the highest matches first. The module will search all files with the extensions that you can easily place the extension name into the web.config file where indicated. Files or folders that you don't want searched can be placed in the web.config where indicated, so these files and folders are not searched. Also now you can choose the encoding of your choice.

    To get the latest code click here.
    Thursday, May 24, 2007 9:03 AM
  • @Arijit; U do seem to have a lot of knowledge on Search Engines.Can u tell me what are the languages currently being used for making search engines, and which one of them would be the best from the point of view of the Learning Curve as well as its robustness and features.
        Actually, I am quite interested in this topic and want to create my own search engine.
    Thursday, May 24, 2007 12:29 PM
  • arijit you have posted some woderful links man....
    Thursday, May 24, 2007 1:08 PM
  • @anoop, maithilee & jagdeesh. Thanks for ur appreciation m8.

    @maithilee: Read the following whitepaper "A Comparison of Free Search Engine Software"

    http://www.searchtools.com/analysis/free-search-engine-comparison.html

    It will answer ur queries regarding what are the S/Ws being used, and how do they perform, what language they are implemented in etc, with links for downloading them too. And what's more they are FREE !!

    Happy Coding!!
    Thursday, May 24, 2007 2:03 PM
  • That was quite a helpful link, Arijit. Please post more details, if possible.

    Thank u for the help.
    Friday, May 25, 2007 6:15 AM
  • Do you have a set of algorithms that I could use. Also, I am planning to use Java for coding, could you give me links to some APIs.
    Friday, May 25, 2007 6:18 AM
  • Hi Maithilee.
        Yes, I do have a set of Algorithms. In fact I have one that I have made myself and have just given it for Journal Publicatiom. I shall give the links to the APIs in 1 or 2 Days. Till that Time please go though these 3 links. They are about 'Google' and its Page Rank Algorithm and are quite informative:

    http://en.wikipedia.org/wiki/PageRank
    http://pr.efactory.de/e-further-factors.shtml
    http://www.google-watch.org/pagerank.html


    Please Reply, if you like my posts.
    Friday, May 25, 2007 6:57 AM
  • Hi,

    First of all you will need to perform processing of the search query. This is done byText simplification.
    Text simplification
    is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain complex compound constructions that are not easily processed through automation.
    A few techniques used here are Stemming and Removal of Stop Words. (use Porter Stemmer)
    U will also need a POS Tagger, use Qtag, it is probabilistic Part of Speech Tagger and will help u identify whether the words are nouns,verbs etc.

    http://www.english.bham.ac.uk/staff/omason/software/qtag.html

    Research Papers on Text Simplification are:

    The next thing that you would be doing is Document Indexing(For Faster Retrieval of Documents from a Search).
    For this you would need to tokenize documents. You can use different APIs, that supports both Text and HTML parsing. You can use Lucene which is a very powerful Information Retrieval Tool available for both .Net and Java. http://lucene.apache.org/java/docs/.

    Terminology extraction

    Terminology extraction, term extraction, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus (set of Documents).



    U can use Ontologies like WordNet or other Domain Specific Ontologies, to base the comparison between semantics (meaning) of a document instead of just its words.


    Comparison of Query and Documents can be done through application of Cosine Similarity:
    The theoretical explanations can be found at:

    http://www.miislita.com/information-retrieval-tutorial/cosine-similarity-tutorial.html

    Go through this paper to see how wordnet is used to get semantics of words and augmenting it with words from the document.(by finding hypernym density)
    http://ucrel.lancs.ac.uk/acl/W/W98/W98-0706.pdf

    You can also use PageRank formula, as stated in the previous posts.



    Monday, May 28, 2007 11:59 AM
  • Arijit, is this your topic of interest?..i meanSEO???...asking cause you have been posting well and quality stuff in this forum on this topic
    Monday, May 28, 2007 2:26 PM
  • Hi Anoop,

     You are partially correct in ur guess. My area of interest is actually NLP (Natural Language Processing) and I have 4 papers published in international conference proceedings, regarding the same.
    Tuesday, May 29, 2007 4:53 AM
  • Good info m8.
    Tuesday, May 29, 2007 4:55 PM
  • Nice work, in future if i will need a professional help on the topic, i know whom to contact
    Wednesday, May 30, 2007 12:35 PM
  • Exactly varun
    Wednesday, May 30, 2007 5:25 PM
  • Custom Search Engine

    Create a highly specialized Custom Search Engine that reflects your knowledge and interests. Place it on your website and, using our AdSense for Search program, make money from the resulting traffic.

    See examples of how a Custom Search Engine works.

    What you can do with a Custom Search Engine

    • Place a search box and search results on your website.
    • Specify or prioritize the sites you want to include in searches.
    • Customize the look and feel to match your website.
    • Invite your community to contribute to the search engine.
    Try it out here:

    http://www.google.com/coop/
    Thursday, May 31, 2007 9:08 AM
  • Advanced  Google Search  Operators

    Advanced Operators

    Google supports several advanced operators, which are query words that have special meaning to Google. Typically these operators modify the search in some way, or even tell Google to do a totally different type of search. For instance, "link:" is a special operator, and the query [link:www.google.com] doesn't do a normal search but instead finds all web pages that have links to www.google.com.

    Several of the more common operators use punctuation instead of words, or do not require a colon. Among these operators are OR, "" (the quote operator), - (the minus operator), and + (the plus operator). More information on these types of operators is available on the Basics of Search page. Many of these special operators are accessible from the Advanced Search page, but some are not. Below is a list of all the special operators Google supports.

    Alternate query types

    cache:  

    If you include other words in the query, Google will highlight those words within the cached document. For instance, [cache:www.google.com web] will show the cached content with the word "web" highlighted.

    This functionality is also accessible by clicking on the "Cached" link on Google's main results page.

    The query [cache:] will show the version of the web page that Google has in its cache. For instance, [cache:www.google.com] will show Google's cache of the Google homepage. Note there can be no space between the "cache:" and the web page url.

         
    link:  

    The query [link:] will list webpages that have links to the specified webpage. For instance, [link:www.google.com] will list webpages that have links pointing to the Google homepage. Note there can be no space between the "link:" and the web page url.

    This functionality is also accessible from the Advanced Search page, under Page Specific Search > Links.

         
    related:  

    The query [related:] will list web pages that are "similar" to a specified web page. For instance, [related:www.google.com] will list web pages that are similar to the Google homepage. Note there can be no space between the "related:" and the web page url.

    This functionality is also accessible by clicking on the "Similar Pages" link on Google's main results page, and from the Advanced Search page, under Page Specific Search > Similar.

         
    info:  

    The query [info:] will present some information that Google has about that web page. For instance, [info:www.google.com] will show information about the Google homepage. Note there can be no space between the "info:" and the web page url.

    This functionality is also accessible by typing the web page url directly into a Google search box.

    Other information needs

    define:  

    The query [define:] will provide a definition of the words you enter after it, gathered from various online sources. The definition will be for the entire phrase entered (i.e., it will include all the words in the exact order you typed them).

         
    stocks:  

    If you begin a query with the [stocks:] operator, Google will treat the rest of the query terms as stock ticker symbols, and will link to a page showing stock information for those symbols. For instance, [stocks: intc yhoo] will show information about Intel and Yahoo. (Note you must type the ticker symbols, not the company name.)

    This functionality is also available if you search just on the stock symbols (e.g. [ intc yhoo ]) and then click on the "Show stock quotes" link on the results page.

    Query modifiers

    site:  

    If you include [site:] in your query, Google will restrict the results to those websites in the given domain. For instance, [help site:www.google.com] will find pages about help within www.google.com. [help site:com] will find pages about help within .com urls. Note there can be no space between the "site:" and the domain.

    This functionality is also available through Advanced Search page, under Advanced Web Search > Domains.

         
    allintitle:    

    If you start a query with [allintitle:], Google will restrict the results to those with all of the query words in the title. For instance, [allintitle: google search] will return only documents that have both "google" and "search" in the title.

    This functionality is also available through Advanced Search page, under Advanced Web Search > Occurrences.

         
    intitle:  

    If you include [intitle:] in your query, Google will restrict the results to documents containing that word in the title. For instance, [intitle:google search] will return documents that mention the word "google" in their title, and mention the word "search" anywhere in the document (title or no). Note there can be no space between the "intitle:" and the following word.

    Putting [intitle:] in front of every word in your query is equivalent to putting [allintitle:] at the front of your query: [intitle:google intitleTongue Tiedearch] is the same as [allintitle: google search].

         
    allinurl:  

    If you start a query with [allinurl:], Google will restrict the results to those with all of the query words in the url. For instance, [allinurl: google search] will return only documents that have both "google" and "search" in the url.

    Note that [allinurl:] works on words, not url components. In particular, it ignores punctuation. Thus, [allinurl: foo/bar] will restrict the results to page with the words "foo" and "bar" in the url, but won't require that they be separated by a slash within that url, that they be adjacent, or that they be in that particular word order. There is currently no way to enforce these constraints.

    This functionality is also available through Advanced Search page, under Advanced Web Search > Occurrences.

         
    inurl:  

    If you include [inurl:] in your query, Google will restrict the results to documents containing that word in the url. For instance, [inurl:google search] will return documents that mention the word "google" in their url, and mention the word "search" anywhere in the document (url or no). Note there can be no space between the "inurl:" and the following word.

    Putting "inurl:" in front of every word in your query is equivalent to putting "allinurl:" at the front of your query: [inurl:google inurlTongue Tiedearch] is the same as [allinurl: google search].

    Wednesday, June 6, 2007 6:47 AM
  • You can also perform calculation , all standar calculator functions + scientific functions can be performed from the same google search bar.
    Eg : Search ofor 10+20/30

    You can also know the current rate of currency
    Eg : just try 1 INR = ? USD
    Wednesday, June 6, 2007 2:21 PM
  • Thx for the info m8.

    Also, check out the new line of Search Engines, created using the Web 2.0 Technology.

    Category wise Listings

    Mashups and Tagging:

    Many of the new search engines use the modular functionality of Web 2.0: mash together several services and add new features.

    1. Ajaxwhois.
      Doing a little domain name research? Ajaxwhois takes an existing protocol, WHOIS, and wraps it with a more responsive one. It's not a traditional search engine per se, but does make finding domain registration information faster. Start typing, and if you stop, it sends out a query. Add a few more characters to the domain name, and the query starts fresh. Results include links to hosting plans, the site (if it's registered), and Alexaholic, which is a mashup of Alexa, a Web traffic rankings service.
    2. FlickrStorm.
      FlickrStorm provides a nice mashup for flickr images. Enter a tag and it comes back with square thumbnails. Scroll through the array, click on images, and they'll be displayed larger. Add the ones you like to your own "tray", for later download. It's a simple but effective interface for consuming photos. An "advanced" feature filters images by license types, including Creative Commons.
    3. FundooWeb.
      FundooWeb is a multi-mashup, incorporating results from Yahoo!, Flickr, Yahoo! News, Yahoo! Answers, Amazon, and Yahoo! Maps images. If you search all sources, the results are presented in a couple of formats, including collapsible headlines and a Flickr photo strip, partitioned by source. There's obviously a heavy leaning to Yahoo, but it's not a bad way to conveniently compartmentalize several search result sets.
    4. Keotag.
      Keotag's initial face looks quite simple, with font sizes large enough for the dead to read. Type in a keyword or phrase and a line of favicons appear for Google, Technorati, and Bloglines, as well as over a dozen social bookmarking and community news sites. At far left is a Technorati chart showing the number of blog posts containing the key phrase over the past 30 days. Clicking on a particular favicon reveals result headlines for that source, which can be subscribed to through the resulting RSS feed.
    5. Whonu.   (MY FAVORITE !!!)
      Whonu is arguably one of the very first semantic Web search engines available. It offers over 300 search sources and a smart interface that contextualizes what you enter. For example, enter a US ZIP code and whonu presents a set of links to geocode tools including maps, weather maps, and even public events in Google Calendar. There are so many features that the demo screencast video is 26 minutes long. Information is double partitioned by file type and source. The variety of options might be a bit intimidating, but for power research, whonu looks like one of the most promising search tools available, with an effort made to present structured meaning. Killer feature — saved query history using a row of dots.
    6. Similicio.us.
      Similicio.us is mashup of del.icio.us which tries to find sites related to a user-entered URL. In other words, "people who liked this site also liked". This recommendation engine idea is so basic in functionality that it's a surprise someone didn't think of it before. The creator of the site admits that similicio.us currently uses shallow searching on del.icio.us to keep queries to their service at a minimum. This is no doubt an engine that could prove useful in other mashups, were it to be extended in scope. (One possibility is to team up with del.icio.us and have access to their full database and engine.)

    Rich Internet Application Search Interfaces

    The "rich" in RIAs is a matter of personal definition, but engines in this category offer a little something extra in terms of the interface, sometimes employing AJAX.

    1. Huckabuck.
      On the surface, Huckabuck seems like any other text search engine, but click on the "search tuner" button and a neat little "equalizer" panel reveals itself. You can use it to give more weight to the different sources — Google, Yahoo, MSN, Technorati, Digg, del.icio.us - as well color-code results, set the slider for results per page, turn on auto-completion on or off, and more. Click on the Presets arrows to reveal predefined equalizations for Research, Shopping, Blog search, Metasearch, Technology Research, and Social Search. Not a bad start to partitioning search results meaningfully.
    2. Kartoo.
      Kartoo is yet another search engine that partitions results into several categories, some serious, some frivolous. The presentation, however, is quite different, displayed in little clusters using Flash and icons of a sheet of paper for each result. Some results are more relevant than others, and clicking on an icon takes you to a deeper level of results. This paradigm might be a little confusing at first, but hovering your mouse over a result produces a result summary at left, including a screencap of the result page.
    3. KwMap.
      KwMap touts itself as "a keyword map for the whole Internet". Type in a keyword or phrase, and an unusual interface appears. At right is an alphabetical list of related keyphrases. At left is a visual component showing two axes that resemble an insect's antennae, dotted with nodes representing related terms. Clicking on a term's node takes you to another layer of loosely-related terms. This is a new search paradigm, but it offers the opportunity to explore related concepts in small leaps. Thus, a search for the word "tree" could lead you to "tea tree oil" or to a study of ancestor worship (via "family tree"). Hyperlinking mimicks hyper-thought.
    4. Mnemomap.
      Mnemomap uses multiple components to display search results. Topmost is a hierarchical graph with nodes branching off the search term. Non-clickable secondary nodes are "Token", "Tags", "Translations" and "Synonyms". Tertiary nodes are search results and can have either a tight relationship to the original search term or a tenuous relationship. Clicking on a tertiary node either adds it to a bar below for a refined search, or produces a new graph, depending on where you click. Below is a section displaying relevant results from Mnemo, Yahoo, flickr, and YouTube. Mnemomap, currently in Alpha 0.2, is a fascinating paradigm for searching, but more suited to power researchers than to the average search engine user.
    5. PreFound.
      PreFound, which is powered by Eurekster Swicki, is a simple search engine on the surface, but contains a little slider "equalizer" panel similar to the one in Huckaback (above). PreFound's panel has settings for music, movies, TV, xBox, etc., instead of search engines. You do have to register to see and use the equalizer (which they oddly call a social search equalizer) but you do not have to ask a question, view previous answers related to your search, or to promote up a search result.
    6. Quintura.
      Quintura, who recently received funding, presents text or image search results in a minimalist but graphic form resembling a freeform tag cloud. Holding your mouse cursor long enough over a term in the cloud causes new, related terms to appear in the vicinity of the cursor. While the no click interface is a bit disconcerting at first, you can start over by holding the cursor over the original search term, displayed in red text. Any term in focus (hovered over) generates search results in a scrollable panel below.
    7. Ujiko.
      Ujiko has an interface reminiscent of some sort of a video game, presenting results in both a central circle as well as in rows surrounding the circle. The setup allows you to drill down into the categories in the circle or click on actual results on either side, which can be marked as favorites. Ujiko makes a commendable attempt in presenting meaningful results in digestible bites, with a constantly updated interface.
    8. Tagnautica.
      Tagnautica starts off with a minimalist interface: a black background and a "CLICK HERE" message. Click and enter your search term, then wait for the strange revolving circle containing numerous spheres on the circumference, which undulate up and down in size. Talk about organic search results. Each result represents a relate term, which can be drilled down into. Or you can click whatever term is in the center (initially the original search term) to get a page of flickr images. Tagnautica is a fascinating photo search parardigm that's lots of fun and definitely visually inspiring.
    9. Topix.
      Ever want to search for topical Web pages and wish you could easily narrow the search to a certain time period? Topix offers just that ability with a neat little interactive timeline map. Clicking on a particular day produces results ordered reverse chronologically from that day backwards. Definitely a handy tool for research, and would be killer mashed up with other functionality.

    Social Aspects: User Contribution, Recommendation, Social Networks

    Social networks are a hot Web application space, and now they creeping into search engines.

    1. Clipfire.
      Michael Arrington of Techcrunch gave Clipfire his blessing, saying how much he likes this ecommerce deal-finding search engine. Sometimes all you need is a simple interface; it's the members that matter here. The idea is that members submit Web sites, Clipfire searches them, then presents later searchers with product and service deal info. Members are encouraged to use their own affiliate links so that they're motivated to find good deals and share them. This is a unique idea that's unlikely to remain so for much longer.
    2. Omgili.
      Omgili is a discussion-based engine. In addition to standard search results, a list of links to members is provided who have answered questions relating to a given search term. You can also ask a question, which another member might answer for you with relevant links. Recommendation engines such as omgili have their value in end applications, possibly those similar to the music recommendation site iLike (not to be confused with shopping engine, like).

    Visual Search

    Engines in this category allow you to search using images and similarity algorithms.

    1. Like.
      Like is a "visual shopping" engine that starts off with images of products. Click on an image to get an array of related product images. Use the interface to select a focus area of one image to find similar products by shape or color - say similar sunglasses. Like also lets you filter brands and price ranges. It's one of the more sophisticated ways to do affiliate marketing. Of course, while you don't have to enter any text at all to surf'n'shop, the option is there as well.
    2. Pixsy.
      Pixsy is a visual search engine for pictures or videos selected from several sources including Buzznet, flickr, iStockphoto, Fotolia, YouTube, and others. Clicking on an image takes you to the source page. For stock photo sites, this might provide copyright and license details. A handy tool for online publishers looking for suitable images to reprint.
    3. Retrievr.
      Retrievr is a visual search engine in the truest sense of the term, offering the choice of starting with an image (via URL or uploaded) or a sketch from the user, which can be customized by line thickness and color. Images are then retrieved from flickr. Brilliant concept. The honest truth is that very few of the images in the matrix of results have much resemblance to drawn sketches, but those that do are uncanny. An engine like this is only as good as its algorithms (though it uses brainiac wavelet transforms rather than the traditional neural network algorithms). Still, retrievr is an exciting early- generation advanced search engine offering.
    4. Riya.
      Riya visual search, who also offer Like, lets you search amongst people, objects, tags, and photos, as well as gives you a portal to Google, Yahoo, MSN, and flickr. You can browse broadly across the results or drill down through a specific photoset. Results can be emailed, embedded into Myspace or Blogger pages, or subscribed to via the dynamic RSS feed.
    5. Tiltomo.
      Tiltomo is yet another flickr mashup that offers a few search options. Enter a single flickr tag or ask for random images. Once you have an array of images, you can find similar images either by theme or by color/ texture. Tiltomo seems to produce slightly more relevant secondary results than some of the other visual search engines.
    6. Xcavator.
      Xcavator is another flickr-based engine in its early stages. Currently, it seems a bit limited, as there are only five tags from flickr that can be searched. Selecting one brings up an array of images. Dragging and dropping one of these to the xcavator search box and then selecting a point of interest produces a second, more refined image result set. While these sorts of engines have a ways to go before they're highly accurate, it's the promise of what's to come that's exciting.

    Audio/ Video Search

    Up until a few years ago, finding specific music or videos online was a difficult task. Then video search started appearing in traditional search engines. Now, it's creeping into engines with some advanced features.

    1. Liveplasma.
      Liveplasma is a music and video search and discovery engine tied to Amazon.com. Enter an artist, band, movie, director, or actor of interest, and up pops an unusual result set paradigm: floating spheres clustered in overlapping orbits. Each sphere represents information related to the search term. Clicking on a result produces an Amazon summary in the left panel, sometimes with CD/ DVD cover art. Clicking on the summary takes you to its Amazon page. Liveplasma is the type of affiilate marketing search engine that can wag the long tail.
    2. Vdoogle.
      Vdoogle is a video search engine that draws its sources from 14 video sharing sites such as YouTube and DailyMotion, as well as veteran sites such as iFilm. Vdoogle is based on Google's new roll-your-own custom search engine, which is similar to Rollyo. Its Web 2.0 pedigree is tenuous, though it does mashup other Web 2.0 user-contributed services. The accuracy of Vdoogle relies on the proper tagging of source videos, so the engine could do with its own tagging and recommendation engine as an additional layer.




    Thursday, June 7, 2007 6:10 AM
  • Awesome Arijit, this really rox Smile

    lots and lots of research and good stuffs. Thanks for it dude. You also rox.
    Thursday, June 7, 2007 6:38 AM
  • @Harshil 'ETERNALLY GRATEFUL TO YOU FOR YOUR SUPPORT, HELP & ENCOURAGEMENT !!!' Not only in this thread but also throughout the forum.


    Thursday, June 7, 2007 6:42 AM
  • thanks dude for the appreciation. I just like helping friends and people. The main thing is Karm karo and dont expect anything from anyone Smile and you will get your fruit.
    Thursday, June 7, 2007 7:13 AM
  • Written by Emre Sokullu / December 13, 2006 / 59 comments

    Written by Emre Sokullu and edited by Richard MacManus

    You may feel relatively satisfied with the current search offerings of Google, Yahoo, Ask and MSN. Search today is undoubtedly much better than what it was in the second half of the 1990's. But Internet search is still in its infancy and there's much room for improvement. Moreover, the super high valuation of Google on NASDAQ pushes investors and researchers to find better search solutions - to be The Next Big Thing. And these wannabes are not only working on discovering better indexing techniques, they're exploring new horizons like vertical engines, meaning-based search, intent-driven search, new clustering methods, and much more. In this post, we look into latest trends in the search industry. 

    We have positioned the latest search trends into 3 main categories:

    • UI Enhancements
    • Technology Enhancements
    • Approach Enhancements (Vertical Engines)

    UI Enhancements

    Snap

    Snap promises a better interface for search, using the latest advancements in browsers and AJAX technology. Although there were earlier, similar implementations, preview powered search is perhaps the biggest innovation of Snap. With Snap's preview powered search, you don't necessarily need to visit the site to see if it satisfies your needs - you can see a dynamically loaded screenshot in the right side of your window. 

    According to a Microsoft study, users spend 11 minutes on a typical search - so potentially Snap can radically shorten this time. Another benefit is that it allows you browse the search results with a few key strokes, which is another big usability enhancement. However it's worth noting that Snap is slow to process searches as a result, because there's too much Javascript and it's too heavy for most modern browsers and hardware. Also, from a technology point of view, Snap doesn't have much to offer - it uses Ask's existing technology. However they have introduced a power of masses approach with options for "This page is Junk" and "This page is Perfect". 

    Snap's real time query recommendation is also a little similar to an idea once tested at Google Labs. All in all, Snap doesn't bring anything new to the table, but it's a good mashup of some of the innovative ideas in search that we've seen in the last few months.

    SearchMash

    SearchMash is actually a Google site, to test their latest search innovations. SearchMash follows the basic Google principle - it's cutting-edge, but still plain and simple. When you do a typical web search, you also see image, blog, video and Wikipedia results in the right side of the screen. And there's absolutely no noticeable speed loss, thanks to AJAX. Basically it is a shortcut to reach all the information you need. 

    The best innovation of SearchMash is perhaps the "More web results" bar. I strongly recommend Google find a way to implement it into their default engine immediately. It makes it much easier to browse the search results. When you need more information, simply click on "More web results" and new results appear at the bottom - enabling you to continue scrolling down on the same page, instead of opening a new page. SearchMash also allows you to give feedback about the results; this may be a sign of the introduction of power of masses into Google Search. 

    All in all, SearchMash shows that while Google continues to keep itself simple, it also has absolutely no intention of giving way on the innovation front to upcomers. All of the new features in SearchMash are discussed on their About page.

    Live.com

    Live.com, the new internet initiative of Microsoft, had many innovative ideas at the beginning. However as Vista's official release date gets closer, it has become a much more traditional search engine. Besides the technology advances in their algorithms, which Microsoft hopes will enable it to compete with Google, there are/were many UI enhancements as well. There used to be, for example, an infinite scrollbar in Live.com - but this seems to have been removed for the final public release. 

    Most innovations in the image search interface have been kept though - the tiered zooming feature is the most blatant one. Live's Image Search offers seamless user experience enhancements. The infinite scrollbar functionality fits very well and saves you from the hassle of clicking and waiting. And Scratchpad functionality allows you to pick your favourites and compare them smoothly. 

    Overall we can conclude that Live's interface, when compared to old MSN and Microsoft sites, got more simple and Google-like. 

    Technology Advancements

    Search for Meaning by Hakia

    Hakia's motto is "Search for Meaning". Founded by seasoned nuclear scientist Riza Berkan, Hakia has raised more than $30M so far, mostly from European private investors. With Hakia you don't search keywords, instead you directly ask questions to the search engine. Hakia makes deep semantic analysis on the pages they crawl. It introduces a new mosaic-like indexing method called QDEX (Query Detection and Extraction). Despite all these nice promises, currently Hakia does not always return the correct results. However they're still in public alpha release and the company is set to debut its full operations in Jan, 2007. After this date, we will have a better chance to judge Hakia's capabilities. Note that Hakia works on top of Microsoft technologies.

    Also see Read/WriteWeb's recent post reviewing Hakia.

    Clustered Search of Vivisimo and Ask

    Neither Vivisimo nor Ask are new companies. Both offer clustered search, which means fragmenting the results of your query so that users can see related terms and go deeper or broader in their data mining. Vivisimo was the first to offer it and it's very useful in cases where you are researching a topic that you're completely new to. Ask's approach is less dense than Vivisimo's and is somehow similar to Live's related results feature. But as stated above, clustered search is probably not something you'll need all the time - it's more a side feature that may be helpful in some cases.

    Read/WriteWeb profiled Ask last month.

    Intent-Driven Search by Yahoo!

    This is a brilliant idea. Yahoo's research project Mindset brings you results according to your search purposes. For instance, when you enter "Rolex Watches" in the search box, you may be willing to buy a Rolex Watch or make an encyclopedic research about the company. Yahoo's intent-driven search allows you to specify your intent and get the most relevant results. 

    Note that intent-driven search is still in a very early phase, but it's very promising for mainstream users.

    Google's Ori Alon

    In April this year, Google bought a patented technology that allows them to show related terms after your query. For example, if you search information on the War of Independence, this technology gives you a list of related words - like Etzel, Palmach, Ben-Gurion. The patent was taken by an Israeli phD studying in Australia. Google has not released this feature yet on Google or SearchMash, but it is expected to be shown soon. Also, it is rumored that Microsoft and Yahoo were also after this patent, but Google won the race.

    Del.icio.us and Power of Masses

    You may ask, what is del.icio.us doing in between all these search sites - isn't it just a bookmarking system? Well, the answer is both yes and no. While it's true that it's a bookmarking site, Yahoo probably didn't buy them just for bookmarking. Actually del.icio.us is also a great tool that empowers the search results of any search engine. Because when you bookmark a site, this indicates the site is a useful resource - so its "pagerank" should be increased. In other words, del.icio.us can actually be used as a search engine, fueled by the power of masses principle. And del.icio.us is not alone in this - Wink and Snap are also trying to use the power of masses in their search offerings. 

    Supposedly, Google also uses some sort of power of masses with their Personalized Search and Google Toolbar offerings.

    NLP (Natural Language Processing) powered Powerset

    While still in stealth mode, Powerset has already raised $12.5M in pre-money valuation from several venture capital companies and angel investors like Reid Hoffman, Luke Nosek and early Googlers Aydin Senkut and Zain Khan. The difference between Powerset and the traditional search engines is that while typical search engines like Google and Yahoo don't take into account stopwords (by, after, the, etc), stopwords are a very important part of the engine for Powerset. Why? Because Powerset relies on a semantic capability that can be triggered by using these stopwords. So while the "book by children" and "book for children" queries return exactly the same results in Google, Powerset evaluates them separately and somehow cares about your stopwords as well.

    Personalized Search

    Palo Alto based Collarity is a very new company entering into the personalized search area. The question that pushed them into this challenge is: "Why are your search results exactly the same as the next person's search results?" This is not a very new idea - Google (with its Kaltix acquisition in 2003) and others already offer this feature, albeit weakly. However Collarity seems very strong with their innovative interface (Collarity Slider), outsourced approach (Collarity Compass) and promising technology.

    Social Search

    Read/WriteWeb has covered the area of social search very thoroughly already in two articles in July by Ebrahim Ezzy. Two good examples are Eurekster's Swicki and Rollyo. Swicki is a community-driven search engine that allows users to create deep, focused searches on a specific niche - and 'learns' from its community. Rollyo allows users to create and publish their own personal search engines, based on websites they decide to include in their "SearchRoll".

    Image Search

    Image Search has been around for a very long time, but to be frank it's still very primitive. What most image search engines do is just look for text around images and examine the image tags. 

    Riya was the first to introduce advanced face recognition technologies in image search. This obviously requires a lot of computing power and just because of this, Riya's weekly burnrate is supposedly over $100K. Co-founded by web 1.0 veteran Munjal Shah and face recognition gurus Burak Gokturk and Azhar Khan, Riya is now entering a whole new space - "search by likeness" with like.com. This may come in very handy, for example when you try to find a watch that is similar to the one you have a digital photo of. That's why Riya is expected to make partnership deals with, or get acquired by, e-commerce companies like Amazon and eBay. It's worth noting that Riya was once in acquisition negotiations with Google, but this never happened - and Google ended up acquiring another face recognition company, Neven Vision. So we can conclude that Google is pursuing this technology very closely!

    Approach Enhancements (Vertical Search)

    Vertical search is a relatively new discipline in search. Basically, vertical engines look up a very limited subset of the internet - so they are more efficient than generic search engines. Because their search area is not so broad, they can adapt themselves for the specific needs and common points of their area of focus. We won't go in too much detail about vertical search engines, as it has already been covered in a recent article in Read/WriteWeb. But we can categorize the major vertical engines this way:

    • Jobs: SimplyHired.com Indeed.com, Bixee.com (India), Eluta.ca (Canada), Recruit.net (Hong Kong)
    • Travel: Sidestep.com, Kayak.com, Mobissimo.com
    • Health: Amniota.com, CloserLookSearch.com, GenieKnows.com, Healia.com, Healthline.com, Kosmix.com, MammaHealth.com, Google Health
    • Classifieds: Edgeio.com, Oodle.com
    • Blogs: Technorati, Bloglines, Blogger Search, Sphere, Feedster
    • Source Code: Koders.com, Krugle, Google Code

    Conclusion

    The innovation in search does not stop and there's much to look forward to in the search space. What's more, Google and Yahoo search APIs and the open source Nutch and DMOZ projects allow anyone to try out new ideas. Nutch, supported by Yahoo and shielded under Apache Software Foundation, is providing a free global search engine. DMOZ gives you a very large open source web directory edited by volunteers. 

    Google will have a hard time competing not only its big adversaries like Microsoft, Yahoo and Ask - but also the ambitious startups that are opening new dimensions and bringing forth new approaches. We will probably hear of acquisitions in this space as well. 

    We may not have covered all the promising new search offerings here, so please let us know your feedback in the comments below. Also let us know which of the above approaches sounds the most promising to you - and why.

    Saturday, June 9, 2007 8:38 AM
  • To know more about the upcoming Internet Revolution Through Web 3.0, here are a few articles:


    Also here are 2 links that I believe evry Computer Student Should Boomark:

    Tim Berner Lee's Homapage

    Tim Berner Lee's Blog
    Saturday, June 9, 2007 8:58 AM
  • links useful!!
    Saturday, June 9, 2007 11:23 AM
  • More links::

    http://alistapart.com/articles/web3point0

    http://www.androidtech.com/knowledge-blog/2006/11/web-30-you-aint-seen-nothing-yet.html

    http://www.roughtype.com/archives/2006/11/welcome_web_30.php


    Hope u shall like them.

    "Please say if there is some problem with my postings, or you would like to read about something else or discuss on another topic, related to search engines.."
    Monday, June 11, 2007 8:48 AM
  • Monday, June 18, 2007 11:48 AM
  • Open Source Search Engines in Java

    Egothor

    Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.

    Go To Egothor

    Nutch

    Nutch is a nascent effort to implement an open-source web search engine. Nutch provides a transparent alternative to commercial web search engines.

    Go To Nutch

    Lucene

    Jakarta Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

    Go To Lucene

    Oxyus

    Oxyus Search Engine is a Java based Application for indexing web documents for searching from an intranet or the Internet similar to other propietary search engines of the industry. Oxyus has a web module to present search results to the clients throught web browsers using Java Server that access a JDBC repository through Java Beans.

    Go To Oxyus

    BDDBot

    DDBot is a web robot, search engine, and web server written entirely in Java. It was written as an example for a chapter on how to write your search engines, and as such it is very simplistic.

    Go To BDDBot

    Zilverline

    Zilverline is what you could call a 'Reverse Search Engine'. It indexes documents from your local disks (and UNC path style network disks), and allows you to search through them locally or if you're away from your machine, through a webserver on your machine. Zilverline supports collections. A collection is a set of files and directories in a directory. PDF, Word, txt, java, CHM and HTML is supported, as well as zip and rar files. A collection can be indexed, and searched. The results of the search can be retrieved from local disk or remotely, if you run a webserver on your machine. Files inside zip, rar and chm files are extracted, indexed and can be cached. The cache can be mapped to sit behind your webserver as well.

    Go To Zilverline

    YaCy

    This is a distributed web crawler and also a caching HTTP proxy. You are using the online-interface of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the global index.

    Go To YaCy

    Compass

    The Compass Framework is a first class open source Java framework, enabling the power of Search Engine semantics to your application stack declaratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronises changes with the datasource. With Compass: write less code, find data quicker.

    Go To Compass

    Lius

    LIUS - Lucene Index Update and Search LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework adds to Lucene many files format indexing fonctionalities as: Ms Word, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, TXT, Open Office suite and JavaBeans. LIUS is very easy to use; all the configuration of the indexing (types of files to be indexed, fields, etc...) as well as research is defined in a XML file, so the user only have to write few lines of code to carry out the indexing or research. LIUS has been developed from a range of Java technologies and full open source applications.

    Go To Lius

    Solr

    Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.

    Go To Solr

    regain

    ´regain´ is a fast search engine on top of Jakarta-Lucene. It crawles through files or webpages using a plugin architecture of preparators for several file formats and data sources. Search requests are handled via browser based user interface using Java server pages. ´regain´ is released under LGPL and comes in two versions: 1. standalone desktop search program including crawler and http-server 2. server based installation providing full text searching functionality for a website or intranet fileserver using XML configuration files.

    Go To regain

    MG4J

    MG4J (Managing Gigabytes for Java) is a collaborative effort aimed at providing a free Java implementation of inverted-index compression techniques; as a by-product, it offers several general-purpose optimised classes, including fast and compact mutable strings, bit-level I/O, fast unsynchronised buffered streams, (possibly signed) minimal perfect hashing, etc. MG4J functions as a full-fledged text-indexing system. It can analyze, index, and query consistently large document collections.

    Go To MG4J

    Thursday, June 28, 2007 10:54 AM
  • Open Source Crawlers in Java

    Heritrix

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Go To Heritrix

    WebSPHINX

    WebSPHINX ( Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for Web crawlers that browse and process Web pages automatically.

    Go To WebSPHINX

    JSpider

    A highly configurable and customizable Web Spider engine, Developed under the LGPL Open Source license, In 100% pure Java.

    Go To JSpider

    WebEater

    A 100% pure Java program for web site retrieval and offline viewing.

    Go To WebEater

    Java Web Crawler

    Java Web Crawler is a simple Web crawling utility written in Java. It supports the robots exclusion standard.

    Go To Java Web Crawler

    WebLech

    WebLech is a fully featured web site download/mirror tool in Java, which supports many features required to download websites and emulate standard web-browser behaviour as much as possible. WebLech is multithreaded and will feature a GUI console.

    Go To WebLech

    Arachnid

    Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page of a Web site is parsed.

    Go To Arachnid

    JoBo

    JoBo is a simple program to download complete websites to your local computer. Internally it is basically a web spider. The main advantage to other download tools is that it can automatically fill out forms (e.g. for automated login) and also use cookies for session handling. Compared to other products the GUI seems to be very simple, but the internal features matters ! Do you know any download tool that allows it to login to a web server and download content if that server uses a web forms for login and cookies for session handling ? It also features very flexible rules to limit downloads by URL, size and/or MIME type.

    Go To JoBo

    Web-Harvest

    Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.

    Go To Web-Harvest

    Thursday, June 28, 2007 10:55 AM
  •  

    Check Out www.tafiti.com
    Tuesday, September 11, 2007 5:34 AM


  • http://code.google.com/

    Introducing OpenSocial

    OpenSocial provides a common set of APIs for social applications across multiple websites. Using standard JavaScript and HTML, they enable developers to create apps that access a social network's friends and update feeds.

    Common APIs mean you have less to learn to build for multiple websites. OpenSocial is being developed by Google in conjunction with members of the web community. The ultimate goal is for any social website to be able to implement the APIs and host 3rd party social applications. Watch the full announcement of OpenSocial at Campfire One.



    Developer Resources


    • APIs & Developer Tools
      Everything you need to start your project, including developer guides, forums, and tutorials.
    • Open Source Programs
      Find out about Google's Open Source programs, Summer of Code, and projects we've released.
    • Project Hosting
      Starting your own Open Source project? Let Google host the code and documentation for you, free.
    Wednesday, November 7, 2007 9:02 AM
  • Site Directory

       

    Categories

    Choose a category from Products or Programs to display a list of Google products or the associated code.google.com resources in the Products/Programs column.

    Products

    Programs

    Products (Featured)

    Select a product or resource to display a detailed description and links to more information in the Details column.




    Details

    Google AJAX Search API


    The Google AJAX Search API lets you use JavaScript to embed a simple, dynamic Google search box and display search results in your own web pages, or use search results programmatically in innovative ways. If you don't feel like coding, you can even use our code wizards to add custom AJAX search controls to your web page in just a few steps.

    For more information:
    Wednesday, November 7, 2007 9:05 AM

All replies

  • resources are nice.
    Monday, May 21, 2007 5:04 AM
  • You can try this shareware to create a basic search engine for ur HTML pages:

    http://www.searchmakerpro.com/download.html

    and then implement the algorithms in ASP, Servlets or JSP.
    Wednesday, May 23, 2007 6:51 AM

  • Hey Friends:

    Check out this forum
    http://forums.searchenginewatch.com/

    While u r at it check out the posts on:
    http://forums.microsoft.com/SamVaad/ShowPost.aspx?PostID=1562853&SiteID=43
    Thursday, May 24, 2007 9:01 AM
  • The Search Engine module will search an entire page also dynamic pages for matching keyword(s) or a phrase and will count how many times the keyword(s) or phrase are found on the page, and display the results with the highest matches first. The module will search all files with the extensions that you can easily place the extension name into the web.config file where indicated. Files or folders that you don't want searched can be placed in the web.config where indicated, so these files and folders are not searched. Also now you can choose the encoding of your choice.

    To get the latest code click here.
    Thursday, May 24, 2007 9:03 AM
  • @Arijit; U do seem to have a lot of knowledge on Search Engines.Can u tell me what are the languages currently being used for making search engines, and which one of them would be the best from the point of view of the Learning Curve as well as its robustness and features.
        Actually, I am quite interested in this topic and want to create my own search engine.
    Thursday, May 24, 2007 12:29 PM
  • arijit you have posted some woderful links man....
    Thursday, May 24, 2007 1:08 PM
  • @anoop, maithilee & jagdeesh. Thanks for ur appreciation m8.

    @maithilee: Read the following whitepaper "A Comparison of Free Search Engine Software"

    http://www.searchtools.com/analysis/free-search-engine-comparison.html

    It will answer ur queries regarding what are the S/Ws being used, and how do they perform, what language they are implemented in etc, with links for downloading them too. And what's more they are FREE !!

    Happy Coding!!
    Thursday, May 24, 2007 2:03 PM
  • That was quite a helpful link, Arijit. Please post more details, if possible.

    Thank u for the help.
    Friday, May 25, 2007 6:15 AM
  • Do you have a set of algorithms that I could use. Also, I am planning to use Java for coding, could you give me links to some APIs.
    Friday, May 25, 2007 6:18 AM
  • Hi Maithilee.
        Yes, I do have a set of Algorithms. In fact I have one that I have made myself and have just given it for Journal Publicatiom. I shall give the links to the APIs in 1 or 2 Days. Till that Time please go though these 3 links. They are about 'Google' and its Page Rank Algorithm and are quite informative:

    http://en.wikipedia.org/wiki/PageRank
    http://pr.efactory.de/e-further-factors.shtml
    http://www.google-watch.org/pagerank.html


    Please Reply, if you like my posts.
    Friday, May 25, 2007 6:57 AM
  • Hi,

    First of all you will need to perform processing of the search query. This is done byText simplification.
    Text simplification
    is an operation used in natural language processing to modify, enhance, classify or otherwise process an existing corpus of human-readable text in such a way that the grammar and structure of the prose is greatly simplified, while the underlying meaning and information remains the same. Text simplification is an important area of research, because natural human languages ordinarily contain complex compound constructions that are not easily processed through automation.
    A few techniques used here are Stemming and Removal of Stop Words. (use Porter Stemmer)
    U will also need a POS Tagger, use Qtag, it is probabilistic Part of Speech Tagger and will help u identify whether the words are nouns,verbs etc.

    http://www.english.bham.ac.uk/staff/omason/software/qtag.html

    Research Papers on Text Simplification are:

    The next thing that you would be doing is Document Indexing(For Faster Retrieval of Documents from a Search).
    For this you would need to tokenize documents. You can use different APIs, that supports both Text and HTML parsing. You can use Lucene which is a very powerful Information Retrieval Tool available for both .Net and Java. http://lucene.apache.org/java/docs/.

    Terminology extraction

    Terminology extraction, term extraction, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus (set of Documents).



    U can use Ontologies like WordNet or other Domain Specific Ontologies, to base the comparison between semantics (meaning) of a document instead of just its words.


    Comparison of Query and Documents can be done through application of Cosine Similarity:
    The theoretical explanations can be found at:

    http://www.miislita.com/information-retrieval-tutorial/cosine-similarity-tutorial.html

    Go through this paper to see how wordnet is used to get semantics of words and augmenting it with words from the document.(by finding hypernym density)
    http://ucrel.lancs.ac.uk/acl/W/W98/W98-0706.pdf

    You can also use PageRank formula, as stated in the previous posts.



    Monday, May 28, 2007 11:59 AM
  • Arijit, is this your topic of interest?..i meanSEO???...asking cause you have been posting well and quality stuff in this forum on this topic
    Monday, May 28, 2007 2:26 PM
  • Hi Anoop,

     You are partially correct in ur guess. My area of interest is actually NLP (Natural Language Processing) and I have 4 papers published in international conference proceedings, regarding the same.
    Tuesday, May 29, 2007 4:53 AM
  • Good info m8.
    Tuesday, May 29, 2007 4:55 PM
  • Nice work, in future if i will need a professional help on the topic, i know whom to contact
    Wednesday, May 30, 2007 12:35 PM
  • Exactly varun
    Wednesday, May 30, 2007 5:25 PM
  • Custom Search Engine

    Create a highly specialized Custom Search Engine that reflects your knowledge and interests. Place it on your website and, using our AdSense for Search program, make money from the resulting traffic.

    See examples of how a Custom Search Engine works.

    What you can do with a Custom Search Engine

    • Place a search box and search results on your website.
    • Specify or prioritize the sites you want to include in searches.
    • Customize the look and feel to match your website.
    • Invite your community to contribute to the search engine.
    Try it out here:

    http://www.google.com/coop/
    Thursday, May 31, 2007 9:08 AM
  • Advanced  Google Search  Operators

    Advanced Operators

    Google supports several advanced operators, which are query words that have special meaning to Google. Typically these operators modify the search in some way, or even tell Google to do a totally different type of search. For instance, "link:" is a special operator, and the query [link:www.google.com] doesn't do a normal search but instead finds all web pages that have links to www.google.com.

    Several of the more common operators use punctuation instead of words, or do not require a colon. Among these operators are OR, "" (the quote operator), - (the minus operator), and + (the plus operator). More information on these types of operators is available on the Basics of Search page. Many of these special operators are accessible from the Advanced Search page, but some are not. Below is a list of all the special operators Google supports.

    Alternate query types

    cache:  

    If you include other words in the query, Google will highlight those words within the cached document. For instance, [cache:www.google.com web] will show the cached content with the word "web" highlighted.

    This functionality is also accessible by clicking on the "Cached" link on Google's main results page.

    The query [cache:] will show the version of the web page that Google has in its cache. For instance, [cache:www.google.com] will show Google's cache of the Google homepage. Note there can be no space between the "cache:" and the web page url.

         
    link:  

    The query [link:] will list webpages that have links to the specified webpage. For instance, [link:www.google.com] will list webpages that have links pointing to the Google homepage. Note there can be no space between the "link:" and the web page url.

    This functionality is also accessible from the Advanced Search page, under Page Specific Search > Links.

         
    related:  

    The query [related:] will list web pages that are "similar" to a specified web page. For instance, [related:www.google.com] will list web pages that are similar to the Google homepage. Note there can be no space between the "related:" and the web page url.

    This functionality is also accessible by clicking on the "Similar Pages" link on Google's main results page, and from the Advanced Search page, under Page Specific Search > Similar.

         
    info:  

    The query [info:] will present some information that Google has about that web page. For instance, [info:www.google.com] will show information about the Google homepage. Note there can be no space between the "info:" and the web page url.

    This functionality is also accessible by typing the web page url directly into a Google search box.

    Other information needs

    define:  

    The query [define:] will provide a definition of the words you enter after it, gathered from various online sources. The definition will be for the entire phrase entered (i.e., it will include all the words in the exact order you typed them).

         
    stocks:  

    If you begin a query with the [stocks:] operator, Google will treat the rest of the query terms as stock ticker symbols, and will link to a page showing stock information for those symbols. For instance, [stocks: intc yhoo] will show information about Intel and Yahoo. (Note you must type the ticker symbols, not the company name.)

    This functionality is also available if you search just on the stock symbols (e.g. [ intc yhoo ]) and then click on the "Show stock quotes" link on the results page.

    Query modifiers

    site:  

    If you include [site:] in your query, Google will restrict the results to those websites in the given domain. For instance, [help site:www.google.com] will find pages about help within www.google.com. [help site:com] will find pages about help within .com urls. Note there can be no space between the "site:" and the domain.

    This functionality is also available through Advanced Search page, under Advanced Web Search > Domains.

         
    allintitle:    

    If you start a query with [allintitle:], Google will restrict the results to those with all of the query words in the title. For instance, [allintitle: google search] will return only documents that have both "google" and "search" in the title.

    This functionality is also available through Advanced Search page, under Advanced Web Search > Occurrences.

         
    intitle:  

    If you include [intitle:] in your query, Google will restrict the results to documents containing that word in the title. For instance, [intitle:google search] will return documents that mention the word "google" in their title, and mention the word "search" anywhere in the document (title or no). Note there can be no space between the "intitle:" and the following word.

    Putting [intitle:] in front of every word in your query is equivalent to putting [allintitle:] at the front of your query: [intitle:google intitleTongue Tiedearch] is the same as [allintitle: google search].

         
    allinurl:  

    If you start a query with [allinurl:], Google will restrict the results to those with all of the query words in the url. For instance, [allinurl: google search] will return only documents that have both "google" and "search" in the url.

    Note that [allinurl:] works on words, not url components. In particular, it ignores punctuation. Thus, [allinurl: foo/bar] will restrict the results to page with the words "foo" and "bar" in the url, but won't require that they be separated by a slash within that url, that they be adjacent, or that they be in that particular word order. There is currently no way to enforce these constraints.

    This functionality is also available through Advanced Search page, under Advanced Web Search > Occurrences.

         
    inurl:  

    If you include [inurl:] in your query, Google will restrict the results to documents containing that word in the url. For instance, [inurl:google search] will return documents that mention the word "google" in their url, and mention the word "search" anywhere in the document (url or no). Note there can be no space between the "inurl:" and the following word.

    Putting "inurl:" in front of every word in your query is equivalent to putting "allinurl:" at the front of your query: [inurl:google inurlTongue Tiedearch] is the same as [allinurl: google search].

    Wednesday, June 6, 2007 6:47 AM
  • You can also perform calculation , all standar calculator functions + scientific functions can be performed from the same google search bar.
    Eg : Search ofor 10+20/30

    You can also know the current rate of currency
    Eg : just try 1 INR = ? USD
    Wednesday, June 6, 2007 2:21 PM
  • Thx for the info m8.

    Also, check out the new line of Search Engines, created using the Web 2.0 Technology.

    Category wise Listings

    Mashups and Tagging:

    Many of the new search engines use the modular functionality of Web 2.0: mash together several services and add new features.

    1. Ajaxwhois.
      Doing a little domain name research? Ajaxwhois takes an existing protocol, WHOIS, and wraps it with a more responsive one. It's not a traditional search engine per se, but does make finding domain registration information faster. Start typing, and if you stop, it sends out a query. Add a few more characters to the domain name, and the query starts fresh. Results include links to hosting plans, the site (if it's registered), and Alexaholic, which is a mashup of Alexa, a Web traffic rankings service.
    2. FlickrStorm.
      FlickrStorm provides a nice mashup for flickr images. Enter a tag and it comes back with square thumbnails. Scroll through the array, click on images, and they'll be displayed larger. Add the ones you like to your own "tray", for later download. It's a simple but effective interface for consuming photos. An "advanced" feature filters images by license types, including Creative Commons.
    3. FundooWeb.
      FundooWeb is a multi-mashup, incorporating results from Yahoo!, Flickr, Yahoo! News, Yahoo! Answers, Amazon, and Yahoo! Maps images. If you search all sources, the results are presented in a couple of formats, including collapsible headlines and a Flickr photo strip, partitioned by source. There's obviously a heavy leaning to Yahoo, but it's not a bad way to conveniently compartmentalize several search result sets.
    4. Keotag.
      Keotag's initial face looks quite simple, with font sizes large enough for the dead to read. Type in a keyword or phrase and a line of favicons appear for Google, Technorati, and Bloglines, as well as over a dozen social bookmarking and community news sites. At far left is a Technorati chart showing the number of blog posts containing the key phrase over the past 30 days. Clicking on a particular favicon reveals result headlines for that source, which can be subscribed to through the resulting RSS feed.
    5. Whonu.   (MY FAVORITE !!!)
      Whonu is arguably one of the very first semantic Web search engines available. It offers over 300 search sources and a smart interface that contextualizes what you enter. For example, enter a US ZIP code and whonu presents a set of links to geocode tools including maps, weather maps, and even public events in Google Calendar. There are so many features that the demo screencast video is 26 minutes long. Information is double partitioned by file type and source. The variety of options might be a bit intimidating, but for power research, whonu looks like one of the most promising search tools available, with an effort made to present structured meaning. Killer feature — saved query history using a row of dots.
    6. Similicio.us.
      Similicio.us is mashup of del.icio.us which tries to find sites related to a user-entered URL. In other words, "people who liked this site also liked". This recommendation engine idea is so basic in functionality that it's a surprise someone didn't think of it before. The creator of the site admits that similicio.us currently uses shallow searching on del.icio.us to keep queries to their service at a minimum. This is no doubt an engine that could prove useful in other mashups, were it to be extended in scope. (One possibility is to team up with del.icio.us and have access to their full database and engine.)

    Rich Internet Application Search Interfaces

    The "rich" in RIAs is a matter of personal definition, but engines in this category offer a little something extra in terms of the interface, sometimes employing AJAX.

    1. Huckabuck.
      On the surface, Huckabuck seems like any other text search engine, but click on the "search tuner" button and a neat little "equalizer" panel reveals itself. You can use it to give more weight to the different sources — Google, Yahoo, MSN, Technorati, Digg, del.icio.us - as well color-code results, set the slider for results per page, turn on auto-completion on or off, and more. Click on the Presets arrows to reveal predefined equalizations for Research, Shopping, Blog search, Metasearch, Technology Research, and Social Search. Not a bad start to partitioning search results meaningfully.
    2. Kartoo.
      Kartoo is yet another search engine that partitions results into several categories, some serious, some frivolous. The presentation, however, is quite different, displayed in little clusters using Flash and icons of a sheet of paper for each result. Some results are more relevant than others, and clicking on an icon takes you to a deeper level of results. This paradigm might be a little confusing at first, but hovering your mouse over a result produces a result summary at left, including a screencap of the result page.
    3. KwMap.
      KwMap touts itself as "a keyword map for the whole Internet". Type in a keyword or phrase, and an unusual interface appears. At right is an alphabetical list of related keyphrases. At left is a visual component showing two axes that resemble an insect's antennae, dotted with nodes representing related terms. Clicking on a term's node takes you to another layer of loosely-related terms. This is a new search paradigm, but it offers the opportunity to explore related concepts in small leaps. Thus, a search for the word "tree" could lead you to "tea tree oil" or to a study of ancestor worship (via "family tree"). Hyperlinking mimicks hyper-thought.
    4. Mnemomap.
      Mnemomap uses multiple components to display search results. Topmost is a hierarchical graph with nodes branching off the search term. Non-clickable secondary nodes are "Token", "Tags", "Translations" and "Synonyms". Tertiary nodes are search results and can have either a tight relationship to the original search term or a tenuous relationship. Clicking on a tertiary node either adds it to a bar below for a refined search, or produces a new graph, depending on where you click. Below is a section displaying relevant results from Mnemo, Yahoo, flickr, and YouTube. Mnemomap, currently in Alpha 0.2, is a fascinating paradigm for searching, but more suited to power researchers than to the average search engine user.
    5. PreFound.
      PreFound, which is powered by Eurekster Swicki, is a simple search engine on the surface, but contains a little slider "equalizer" panel similar to the one in Huckaback (above). PreFound's panel has settings for music, movies, TV, xBox, etc., instead of search engines. You do have to register to see and use the equalizer (which they oddly call a social search equalizer) but you do not have to ask a question, view previous answers related to your search, or to promote up a search result.
    6. Quintura.
      Quintura, who recently received funding, presents text or image search results in a minimalist but graphic form resembling a freeform tag cloud. Holding your mouse cursor long enough over a term in the cloud causes new, related terms to appear in the vicinity of the cursor. While the no click interface is a bit disconcerting at first, you can start over by holding the cursor over the original search term, displayed in red text. Any term in focus (hovered over) generates search results in a scrollable panel below.
    7. Ujiko.
      Ujiko has an interface reminiscent of some sort of a video game, presenting results in both a central circle as well as in rows surrounding the circle. The setup allows you to drill down into the categories in the circle or click on actual results on either side, which can be marked as favorites. Ujiko makes a commendable attempt in presenting meaningful results in digestible bites, with a constantly updated interface.
    8. Tagnautica.
      Tagnautica starts off with a minimalist interface: a black background and a "CLICK HERE" message. Click and enter your search term, then wait for the strange revolving circle containing numerous spheres on the circumference, which undulate up and down in size. Talk about organic search results. Each result represents a relate term, which can be drilled down into. Or you can click whatever term is in the center (initially the original search term) to get a page of flickr images. Tagnautica is a fascinating photo search parardigm that's lots of fun and definitely visually inspiring.
    9. Topix.
      Ever want to search for topical Web pages and wish you could easily narrow the search to a certain time period? Topix offers just that ability with a neat little interactive timeline map. Clicking on a particular day produces results ordered reverse chronologically from that day backwards. Definitely a handy tool for research, and would be killer mashed up with other functionality.

    Social Aspects: User Contribution, Recommendation, Social Networks

    Social networks are a hot Web application space, and now they creeping into search engines.

    1. Clipfire.
      Michael Arrington of Techcrunch gave Clipfire his blessing, saying how much he likes this ecommerce deal-finding search engine. Sometimes all you need is a simple interface; it's the members that matter here. The idea is that members submit Web sites, Clipfire searches them, then presents later searchers with product and service deal info. Members are encouraged to use their own affiliate links so that they're motivated to find good deals and share them. This is a unique idea that's unlikely to remain so for much longer.
    2. Omgili.
      Omgili is a discussion-based engine. In addition to standard search results, a list of links to members is provided who have answered questions relating to a given search term. You can also ask a question, which another member might answer for you with relevant links. Recommendation engines such as omgili have their value in end applications, possibly those similar to the music recommendation site iLike (not to be confused with shopping engine, like).

    Visual Search

    Engines in this category allow you to search using images and similarity algorithms.

    1. Like.
      Like is a "visual shopping" engine that starts off with images of products. Click on an image to get an array of related product images. Use the interface to select a focus area of one image to find similar products by shape or color - say similar sunglasses. Like also lets you filter brands and price ranges. It's one of the more sophisticated ways to do affiliate marketing. Of course, while you don't have to enter any text at all to surf'n'shop, the option is there as well.
    2. Pixsy.
      Pixsy is a visual search engine for pictures or videos selected from several sources including Buzznet, flickr, iStockphoto, Fotolia, YouTube, and others. Clicking on an image takes you to the source page. For stock photo sites, this might provide copyright and license details. A handy tool for online publishers looking for suitable images to reprint.
    3. Retrievr.
      Retrievr is a visual search engine in the truest sense of the term, offering the choice of starting with an image (via URL or uploaded) or a sketch from the user, which can be customized by line thickness and color. Images are then retrieved from flickr. Brilliant concept. The honest truth is that very few of the images in the matrix of results have much resemblance to drawn sketches, but those that do are uncanny. An engine like this is only as good as its algorithms (though it uses brainiac wavelet transforms rather than the traditional neural network algorithms). Still, retrievr is an exciting early- generation advanced search engine offering.
    4. Riya.
      Riya visual search, who also offer Like, lets you search amongst people, objects, tags, and photos, as well as gives you a portal to Google, Yahoo, MSN, and flickr. You can browse broadly across the results or drill down through a specific photoset. Results can be emailed, embedded into Myspace or Blogger pages, or subscribed to via the dynamic RSS feed.
    5. Tiltomo.
      Tiltomo is yet another flickr mashup that offers a few search options. Enter a single flickr tag or ask for random images. Once you have an array of images, you can find similar images either by theme or by color/ texture. Tiltomo seems to produce slightly more relevant secondary results than some of the other visual search engines.
    6. Xcavator.
      Xcavator is another flickr-based engine in its early stages. Currently, it seems a bit limited, as there are only five tags from flickr that can be searched. Selecting one brings up an array of images. Dragging and dropping one of these to the xcavator search box and then selecting a point of interest produces a second, more refined image result set. While these sorts of engines have a ways to go before they're highly accurate, it's the promise of what's to come that's exciting.

    Audio/ Video Search

    Up until a few years ago, finding specific music or videos online was a difficult task. Then video search started appearing in traditional search engines. Now, it's creeping into engines with some advanced features.

    1. Liveplasma.
      Liveplasma is a music and video search and discovery engine tied to Amazon.com. Enter an artist, band, movie, director, or actor of interest, and up pops an unusual result set paradigm: floating spheres clustered in overlapping orbits. Each sphere represents information related to the search term. Clicking on a result produces an Amazon summary in the left panel, sometimes with CD/ DVD cover art. Clicking on the summary takes you to its Amazon page. Liveplasma is the type of affiilate marketing search engine that can wag the long tail.
    2. Vdoogle.
      Vdoogle is a video search engine that draws its sources from 14 video sharing sites such as YouTube and DailyMotion, as well as veteran sites such as iFilm. Vdoogle is based on Google's new roll-your-own custom search engine, which is similar to Rollyo. Its Web 2.0 pedigree is tenuous, though it does mashup other Web 2.0 user-contributed services. The accuracy of Vdoogle relies on the proper tagging of source videos, so the engine could do with its own tagging and recommendation engine as an additional layer.




    Thursday, June 7, 2007 6:10 AM
  • Awesome Arijit, this really rox Smile

    lots and lots of research and good stuffs. Thanks for it dude. You also rox.
    Thursday, June 7, 2007 6:38 AM
  • @Harshil 'ETERNALLY GRATEFUL TO YOU FOR YOUR SUPPORT, HELP & ENCOURAGEMENT !!!' Not only in this thread but also throughout the forum.


    Thursday, June 7, 2007 6:42 AM
  • thanks dude for the appreciation. I just like helping friends and people. The main thing is Karm karo and dont expect anything from anyone Smile and you will get your fruit.
    Thursday, June 7, 2007 7:13 AM
  • Written by Emre Sokullu / December 13, 2006 / 59 comments

    Written by Emre Sokullu and edited by Richard MacManus

    You may feel relatively satisfied with the current search offerings of Google, Yahoo, Ask and MSN. Search today is undoubtedly much better than what it was in the second half of the 1990's. But Internet search is still in its infancy and there's much room for improvement. Moreover, the super high valuation of Google on NASDAQ pushes investors and researchers to find better search solutions - to be The Next Big Thing. And these wannabes are not only working on discovering better indexing techniques, they're exploring new horizons like vertical engines, meaning-based search, intent-driven search, new clustering methods, and much more. In this post, we look into latest trends in the search industry. 

    We have positioned the latest search trends into 3 main categories:

    • UI Enhancements
    • Technology Enhancements
    • Approach Enhancements (Vertical Engines)

    UI Enhancements

    Snap

    Snap promises a better interface for search, using the latest advancements in browsers and AJAX technology. Although there were earlier, similar implementations, preview powered search is perhaps the biggest innovation of Snap. With Snap's preview powered search, you don't necessarily need to visit the site to see if it satisfies your needs - you can see a dynamically loaded screenshot in the right side of your window. 

    According to a Microsoft study, users spend 11 minutes on a typical search - so potentially Snap can radically shorten this time. Another benefit is that it allows you browse the search results with a few key strokes, which is another big usability enhancement. However it's worth noting that Snap is slow to process searches as a result, because there's too much Javascript and it's too heavy for most modern browsers and hardware. Also, from a technology point of view, Snap doesn't have much to offer - it uses Ask's existing technology. However they have introduced a power of masses approach with options for "This page is Junk" and "This page is Perfect". 

    Snap's real time query recommendation is also a little similar to an idea once tested at Google Labs. All in all, Snap doesn't bring anything new to the table, but it's a good mashup of some of the innovative ideas in search that we've seen in the last few months.

    SearchMash

    SearchMash is actually a Google site, to test their latest search innovations. SearchMash follows the basic Google principle - it's cutting-edge, but still plain and simple. When you do a typical web search, you also see image, blog, video and Wikipedia results in the right side of the screen. And there's absolutely no noticeable speed loss, thanks to AJAX. Basically it is a shortcut to reach all the information you need. 

    The best innovation of SearchMash is perhaps the "More web results" bar. I strongly recommend Google find a way to implement it into their default engine immediately. It makes it much easier to browse the search results. When you need more information, simply click on "More web results" and new results appear at the bottom - enabling you to continue scrolling down on the same page, instead of opening a new page. SearchMash also allows you to give feedback about the results; this may be a sign of the introduction of power of masses into Google Search. 

    All in all, SearchMash shows that while Google continues to keep itself simple, it also has absolutely no intention of giving way on the innovation front to upcomers. All of the new features in SearchMash are discussed on their About page.

    Live.com

    Live.com, the new internet initiative of Microsoft, had many innovative ideas at the beginning. However as Vista's official release date gets closer, it has become a much more traditional search engine. Besides the technology advances in their algorithms, which Microsoft hopes will enable it to compete with Google, there are/were many UI enhancements as well. There used to be, for example, an infinite scrollbar in Live.com - but this seems to have been removed for the final public release. 

    Most innovations in the image search interface have been kept though - the tiered zooming feature is the most blatant one. Live's Image Search offers seamless user experience enhancements. The infinite scrollbar functionality fits very well and saves you from the hassle of clicking and waiting. And Scratchpad functionality allows you to pick your favourites and compare them smoothly. 

    Overall we can conclude that Live's interface, when compared to old MSN and Microsoft sites, got more simple and Google-like. 

    Technology Advancements

    Search for Meaning by Hakia

    Hakia's motto is "Search for Meaning". Founded by seasoned nuclear scientist Riza Berkan, Hakia has raised more than $30M so far, mostly from European private investors. With Hakia you don't search keywords, instead you directly ask questions to the search engine. Hakia makes deep semantic analysis on the pages they crawl. It introduces a new mosaic-like indexing method called QDEX (Query Detection and Extraction). Despite all these nice promises, currently Hakia does not always return the correct results. However they're still in public alpha release and the company is set to debut its full operations in Jan, 2007. After this date, we will have a better chance to judge Hakia's capabilities. Note that Hakia works on top of Microsoft technologies.

    Also see Read/WriteWeb's recent post reviewing Hakia.

    Clustered Search of Vivisimo and Ask

    Neither Vivisimo nor Ask are new companies. Both offer clustered search, which means fragmenting the results of your query so that users can see related terms and go deeper or broader in their data mining. Vivisimo was the first to offer it and it's very useful in cases where you are researching a topic that you're completely new to. Ask's approach is less dense than Vivisimo's and is somehow similar to Live's related results feature. But as stated above, clustered search is probably not something you'll need all the time - it's more a side feature that may be helpful in some cases.

    Read/WriteWeb profiled Ask last month.

    Intent-Driven Search by Yahoo!

    This is a brilliant idea. Yahoo's research project Mindset brings you results according to your search purposes. For instance, when you enter "Rolex Watches" in the search box, you may be willing to buy a Rolex Watch or make an encyclopedic research about the company. Yahoo's intent-driven search allows you to specify your intent and get the most relevant results. 

    Note that intent-driven search is still in a very early phase, but it's very promising for mainstream users.

    Google's Ori Alon

    In April this year, Google bought a patented technology that allows them to show related terms after your query. For example, if you search information on the War of Independence, this technology gives you a list of related words - like Etzel, Palmach, Ben-Gurion. The patent was taken by an Israeli phD studying in Australia. Google has not released this feature yet on Google or SearchMash, but it is expected to be shown soon. Also, it is rumored that Microsoft and Yahoo were also after this patent, but Google won the race.

    Del.icio.us and Power of Masses

    You may ask, what is del.icio.us doing in between all these search sites - isn't it just a bookmarking system? Well, the answer is both yes and no. While it's true that it's a bookmarking site, Yahoo probably didn't buy them just for bookmarking. Actually del.icio.us is also a great tool that empowers the search results of any search engine. Because when you bookmark a site, this indicates the site is a useful resource - so its "pagerank" should be increased. In other words, del.icio.us can actually be used as a search engine, fueled by the power of masses principle. And del.icio.us is not alone in this - Wink and Snap are also trying to use the power of masses in their search offerings. 

    Supposedly, Google also uses some sort of power of masses with their Personalized Search and Google Toolbar offerings.

    NLP (Natural Language Processing) powered Powerset

    While still in stealth mode, Powerset has already raised $12.5M in pre-money valuation from several venture capital companies and angel investors like Reid Hoffman, Luke Nosek and early Googlers Aydin Senkut and Zain Khan. The difference between Powerset and the traditional search engines is that while typical search engines like Google and Yahoo don't take into account stopwords (by, after, the, etc), stopwords are a very important part of the engine for Powerset. Why? Because Powerset relies on a semantic capability that can be triggered by using these stopwords. So while the "book by children" and "book for children" queries return exactly the same results in Google, Powerset evaluates them separately and somehow cares about your stopwords as well.

    Personalized Search

    Palo Alto based Collarity is a very new company entering into the personalized search area. The question that pushed them into this challenge is: "Why are your search results exactly the same as the next person's search results?" This is not a very new idea - Google (with its Kaltix acquisition in 2003) and others already offer this feature, albeit weakly. However Collarity seems very strong with their innovative interface (Collarity Slider), outsourced approach (Collarity Compass) and promising technology.

    Social Search

    Read/WriteWeb has covered the area of social search very thoroughly already in two articles in July by Ebrahim Ezzy. Two good examples are Eurekster's Swicki and Rollyo. Swicki is a community-driven search engine that allows users to create deep, focused searches on a specific niche - and 'learns' from its community. Rollyo allows users to create and publish their own personal search engines, based on websites they decide to include in their "SearchRoll".

    Image Search

    Image Search has been around for a very long time, but to be frank it's still very primitive. What most image search engines do is just look for text around images and examine the image tags. 

    Riya was the first to introduce advanced face recognition technologies in image search. This obviously requires a lot of computing power and just because of this, Riya's weekly burnrate is supposedly over $100K. Co-founded by web 1.0 veteran Munjal Shah and face recognition gurus Burak Gokturk and Azhar Khan, Riya is now entering a whole new space - "search by likeness" with like.com. This may come in very handy, for example when you try to find a watch that is similar to the one you have a digital photo of. That's why Riya is expected to make partnership deals with, or get acquired by, e-commerce companies like Amazon and eBay. It's worth noting that Riya was once in acquisition negotiations with Google, but this never happened - and Google ended up acquiring another face recognition company, Neven Vision. So we can conclude that Google is pursuing this technology very closely!

    Approach Enhancements (Vertical Search)

    Vertical search is a relatively new discipline in search. Basically, vertical engines look up a very limited subset of the internet - so they are more efficient than generic search engines. Because their search area is not so broad, they can adapt themselves for the specific needs and common points of their area of focus. We won't go in too much detail about vertical search engines, as it has already been covered in a recent article in Read/WriteWeb. But we can categorize the major vertical engines this way:

    • Jobs: SimplyHired.com Indeed.com, Bixee.com (India), Eluta.ca (Canada), Recruit.net (Hong Kong)
    • Travel: Sidestep.com, Kayak.com, Mobissimo.com
    • Health: Amniota.com, CloserLookSearch.com, GenieKnows.com, Healia.com, Healthline.com, Kosmix.com, MammaHealth.com, Google Health
    • Classifieds: Edgeio.com, Oodle.com
    • Blogs: Technorati, Bloglines, Blogger Search, Sphere, Feedster
    • Source Code: Koders.com, Krugle, Google Code

    Conclusion

    The innovation in search does not stop and there's much to look forward to in the search space. What's more, Google and Yahoo search APIs and the open source Nutch and DMOZ projects allow anyone to try out new ideas. Nutch, supported by Yahoo and shielded under Apache Software Foundation, is providing a free global search engine. DMOZ gives you a very large open source web directory edited by volunteers. 

    Google will have a hard time competing not only its big adversaries like Microsoft, Yahoo and Ask - but also the ambitious startups that are opening new dimensions and bringing forth new approaches. We will probably hear of acquisitions in this space as well. 

    We may not have covered all the promising new search offerings here, so please let us know your feedback in the comments below. Also let us know which of the above approaches sounds the most promising to you - and why.

    Saturday, June 9, 2007 8:38 AM
  • To know more about the upcoming Internet Revolution Through Web 3.0, here are a few articles:


    Also here are 2 links that I believe evry Computer Student Should Boomark:

    Tim Berner Lee's Homapage

    Tim Berner Lee's Blog
    Saturday, June 9, 2007 8:58 AM
  • links useful!!
    Saturday, June 9, 2007 11:23 AM
  • More links::

    http://alistapart.com/articles/web3point0

    http://www.androidtech.com/knowledge-blog/2006/11/web-30-you-aint-seen-nothing-yet.html

    http://www.roughtype.com/archives/2006/11/welcome_web_30.php


    Hope u shall like them.

    "Please say if there is some problem with my postings, or you would like to read about something else or discuss on another topic, related to search engines.."
    Monday, June 11, 2007 8:48 AM
  • Monday, June 18, 2007 11:48 AM
  • Open Source Search Engines in Java

    Egothor

    Egothor is an Open Source, high-performance, full-featured text search engine written entirely in Java. It is technology suitable for nearly any application that requires full-text search, especially cross-platform. It can be configured as a standalone engine, metasearcher, peer-to-peer HUB, and, moreover, it can be used as a library for an application that needs full-text search.

    Go To Egothor

    Nutch

    Nutch is a nascent effort to implement an open-source web search engine. Nutch provides a transparent alternative to commercial web search engines.

    Go To Nutch

    Lucene

    Jakarta Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.

    Go To Lucene

    Oxyus

    Oxyus Search Engine is a Java based Application for indexing web documents for searching from an intranet or the Internet similar to other propietary search engines of the industry. Oxyus has a web module to present search results to the clients throught web browsers using Java Server that access a JDBC repository through Java Beans.

    Go To Oxyus

    BDDBot

    DDBot is a web robot, search engine, and web server written entirely in Java. It was written as an example for a chapter on how to write your search engines, and as such it is very simplistic.

    Go To BDDBot

    Zilverline

    Zilverline is what you could call a 'Reverse Search Engine'. It indexes documents from your local disks (and UNC path style network disks), and allows you to search through them locally or if you're away from your machine, through a webserver on your machine. Zilverline supports collections. A collection is a set of files and directories in a directory. PDF, Word, txt, java, CHM and HTML is supported, as well as zip and rar files. A collection can be indexed, and searched. The results of the search can be retrieved from local disk or remotely, if you run a webserver on your machine. Files inside zip, rar and chm files are extracted, indexed and can be cached. The cache can be mapped to sit behind your webserver as well.

    Go To Zilverline

    YaCy

    This is a distributed web crawler and also a caching HTTP proxy. You are using the online-interface of the application. You can use this interface to configure your personal settings, proxy settings, access control and crawling properties. You can also use this interface to start crawls, send messages to other peers and monitor your index, cache status and crawling processes. Most important, you can use the search page to search either your own or the global index.

    Go To YaCy

    Compass

    The Compass Framework is a first class open source Java framework, enabling the power of Search Engine semantics to your application stack declaratively. Built on top of the amazing Lucene Search Engine, Compass integrates seamlessly to popular development frameworks like Hibernate and Spring. It provides search capability to your application data model and synchronises changes with the datasource. With Compass: write less code, find data quicker.

    Go To Compass

    Lius

    LIUS - Lucene Index Update and Search LIUS is an indexing Java framework based on the Jakarta Lucene project. The LIUS framework adds to Lucene many files format indexing fonctionalities as: Ms Word, Ms Excel, Ms PowerPoint, RTF, PDF, XML, HTML, TXT, Open Office suite and JavaBeans. LIUS is very easy to use; all the configuration of the indexing (types of files to be indexed, fields, etc...) as well as research is defined in a XML file, so the user only have to write few lines of code to carry out the indexing or research. LIUS has been developed from a range of Java technologies and full open source applications.

    Go To Lius

    Solr

    Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP APIs, caching, replication, and a web administration interface.

    Go To Solr

    regain

    ´regain´ is a fast search engine on top of Jakarta-Lucene. It crawles through files or webpages using a plugin architecture of preparators for several file formats and data sources. Search requests are handled via browser based user interface using Java server pages. ´regain´ is released under LGPL and comes in two versions: 1. standalone desktop search program including crawler and http-server 2. server based installation providing full text searching functionality for a website or intranet fileserver using XML configuration files.

    Go To regain

    MG4J

    MG4J (Managing Gigabytes for Java) is a collaborative effort aimed at providing a free Java implementation of inverted-index compression techniques; as a by-product, it offers several general-purpose optimised classes, including fast and compact mutable strings, bit-level I/O, fast unsynchronised buffered streams, (possibly signed) minimal perfect hashing, etc. MG4J functions as a full-fledged text-indexing system. It can analyze, index, and query consistently large document collections.

    Go To MG4J

    Thursday, June 28, 2007 10:54 AM
  • Open Source Crawlers in Java

    Heritrix

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Go To Heritrix

    WebSPHINX

    WebSPHINX ( Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for Web crawlers that browse and process Web pages automatically.

    Go To WebSPHINX

    JSpider

    A highly configurable and customizable Web Spider engine, Developed under the LGPL Open Source license, In 100% pure Java.

    Go To JSpider

    WebEater

    A 100% pure Java program for web site retrieval and offline viewing.

    Go To WebEater

    Java Web Crawler

    Java Web Crawler is a simple Web crawling utility written in Java. It supports the robots exclusion standard.

    Go To Java Web Crawler

    WebLech

    WebLech is a fully featured web site download/mirror tool in Java, which supports many features required to download websites and emulate standard web-browser behaviour as much as possible. WebLech is multithreaded and will feature a GUI console.

    Go To WebLech

    Arachnid

    Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page of a Web site is parsed.

    Go To Arachnid

    JoBo

    JoBo is a simple program to download complete websites to your local computer. Internally it is basically a web spider. The main advantage to other download tools is that it can automatically fill out forms (e.g. for automated login) and also use cookies for session handling. Compared to other products the GUI seems to be very simple, but the internal features matters ! Do you know any download tool that allows it to login to a web server and download content if that server uses a web forms for login and cookies for session handling ? It also features very flexible rules to limit downloads by URL, size and/or MIME type.

    Go To JoBo

    Web-Harvest

    Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.

    Go To Web-Harvest

    Thursday, June 28, 2007 10:55 AM
  •  

    Check Out www.tafiti.com
    Tuesday, September 11, 2007 5:34 AM


  • http://code.google.com/

    Introducing OpenSocial

    OpenSocial provides a common set of APIs for social applications across multiple websites. Using standard JavaScript and HTML, they enable developers to create apps that access a social network's friends and update feeds.

    Common APIs mean you have less to learn to build for multiple websites. OpenSocial is being developed by Google in conjunction with members of the web community. The ultimate goal is for any social website to be able to implement the APIs and host 3rd party social applications. Watch the full announcement of OpenSocial at Campfire One.



    Developer Resources


    • APIs & Developer Tools
      Everything you need to start your project, including developer guides, forums, and tutorials.
    • Open Source Programs
      Find out about Google's Open Source programs, Summer of Code, and projects we've released.
    • Project Hosting
      Starting your own Open Source project? Let Google host the code and documentation for you, free.
    Wednesday, November 7, 2007 9:02 AM
  • Site Directory

       

    Categories

    Choose a category from Products or Programs to display a list of Google products or the associated code.google.com resources in the Products/Programs column.

    Products

    Programs

    Products (Featured)

    Select a product or resource to display a detailed description and links to more information in the Details column.




    Details

    Google AJAX Search API


    The Google AJAX Search API lets you use JavaScript to embed a simple, dynamic Google search box and display search results in your own web pages, or use search results programmatically in innovative ways. If you don't feel like coding, you can even use our code wizards to add custom AJAX search controls to your web page in just a few steps.

    For more information:
    Wednesday, November 7, 2007 9:05 AM
  • Hi!

    This is a mail that I just addressed to PubMed concerning their search engine.

    You may find it useful.


    Your searches use:

    -mainly nouns and adjectives, as keywords;

    -conjunctions,  as formal logical operators.

     

    The parts of speech in English display a distribution which becomes evident when you put one of the freely available web dictionaries in an Excel spreadsheet, like in the attached file.

    The most numerous words are “carriers of sense” (sémantèmes, they exist because of either the concepts of substance they are corresponding to: nouns, adjectives; or the concepts action they are corresponding to: verbs).

    A certain number of words exist because the sémantèmes need to be put in relation. As a consequence, they express certain types of relation, valid no matter what the sémantèmes are.

    When sémantèmes undergo changes to allow relations with other words, they are submitted to “flexion”.   

    Dictionaries take into account only the “basal” form of a word: nominative for nouns, infinitive for verbs.

     “Flexion” is a changed form under which words appear; it takes in charge to make specific the relation of a given word to other words in a phrase, according to their sense.

    “Highly flexionary languages”, like Sanskrit, German, Finnish, Estonian, Hungarian, Italian - to cite only the classic and those modern ones having importance beyond their national borders- , are in fact languages with a dominant enclitic flexion. Including all the words resulting from flexion in a dictionary would generate a huge list.

    This is no longer the case with many modern languages, including English, which have “delegated” most of the sense of enclitic flexion to auxiliaries, mainly prepositions and certain adjectives and conjunctions. Strangely, a term like “extraclitic” (or “exoclitic”?) flexion hasn’t been adopted!

    Flexion –enclitic, “exoclitic”- carries the logic of sense –the “semantic logic”. It is polyvalent.

    But a part of the language seems to apply an “existential judgment” to groups of words, taking or not into account their semantic value. This existential judgment uses what could be named a “metalanguage” composed of “to exist”, “yes”, ”no”, ”and”, ”or”, ”either/or”. These are known as “logical operators”.

    Search engines use this existential metalanguage. They retrieve items “existentially”: criteria exist together or exclude each other. They do not look for items where search criteria are related on semantic bases. Hence, they don’t use flexion.

    In English such relations need not enclitic flexion (which would lead to a huge increase in the words in a dictionary) because the flexion is “exoclitic”. So, a search engine could have an enormous “semantic gain” if it accepted at least the most common words of the exoclitic flexion, namely prepositions : 1) in, inside, within –for the latin locative; 2) from –for the latin ablative- and to –for the latin dative; 3) of the, “ ‘ “, “’s” –for the genitive.

    A quite exhaustive list of the prepositional correspondences of the cases could probably be found in a Finnish dictionary. But I doubt if they are of much use in medicine and biology. This is seemingly the case of adverbs too. Probably they will be useful for general purpose search engines (assuming they stop giving tens of millions of answers in less than a second and assuming they stop give priority to advertising sites).

    And, of course: if search engines (medical or not) worked in German or Finnish!   

     

    Searches in databases like yours benefit of introducing them, at least partially. I consider the following for a beginning:

    -in, within, inside;

    -from (for equivalence with the ablative in latin);

    -to (for equivalence with dative in latin);

    -',...'s for the genitive.

As a quick example of their use, one has merely to compare the results of the searches like :1) rna cytoplasm/rna IN cytoplasm. With the latter, you get an answer which corresponds more to your interest.

    Saturday, February 2, 2008 12:08 PM

  • Valuable thread !!!!

    Saturday, February 2, 2008 3:25 PM