Search Engine Strategies: Digging Deeply - Greg Notess
searchengineshowdown.org, also book: teaching web search skills
snipr.com/il2007strategies - slides from this presentation (slideshare)
Overview:
Phrase search for a name (unique results) - ask.com - 5 results, live - 4 results, yahoo - 9 (or 20 unclustered) results, google - 3 results
Long tail of searches: when you are looking for something very specific that may not even exist out there on the web, the hard to find things on the fringes where there is not a lot of overlap. Few librarians use Microsoft Live search, but Live Spaces had two results that were not in any other results.
Search switching:
If you are only searching one place (google), how good a search are you doing? are you doing the same thing they just did?
One way to use search switching is by using the search box in IE7 or Firefox. Click the drop down list of other search engines to run the search again.
Ask.com is better at general broad search
Another way is to use bookmarklets (available on the searchenginewatch site - right side of page)
Other alternatives: zuula (beta), travando.it, turboscout
flashearth.com: choose the map you want to search (google, live, yahoo,etc.), and switch between the sources. Keeps the same zoom level when switching.
booksearchx3 (kokogiak.com/booksearch): searches against 3 separate book databases - (3 columns) includes amazon, google and live search. These databases do not overlap as much as the web searches.
Not exactly metasearch engines. Instead they take you directly to the search engine you want.
All major search engines have a similar look and feel with tabs and drop down options, but most people just do a web search, which is why search engines are moving to the vertical searching and offering options within the results.
Next generation challenges
Citations: why and what
There is confusion about what is the original source (e.g., same information on wikipedia and CIA world fact book - what was the original source). How complete are source citations (e.g., in wikipedia) when the source is in a closed database (online journal?)
Wikipedia issues: which version of a wikipedia article is going to be cited and used; not just what day, but what day and what time (hint: check the change list in wikipedia - all prior versions are available, but not searchable). If you are going to cite from Wikipedia, it is probably better to cite from the archive because that will not change. The current one may (and probably will). Look for the “history” tab at the top of the page.
Facebook searching: privacy vs. search
Facebook is now a data source for looking for people. In general, you get a full profile if you are within the network, maybe picture,name, and networks they belong to (less is available outside the network). If you are trying to do a find information on a person, you can join a network they also belong to - you will get more information than just in network information, without being their “friend”. Problem is you can only join limited regional networks, and can only switch reginal networks twice every 60 days. If you join someone else’s workplace, you need to have an email address from that particular work address or college. So it is limited how far you can dig into the network to find people.
Facebook has said they are going to make their information available to outside search engines, but it is still limited information available.
You can also look there to find facebook statistics (regions, interests, etc)
Other book searching (using book search to dig inside a book):
Microsoft’s live search also has some current book content.
New information source for searchers: searching, not reading; limited access.
Searching can be a real challenge because of poor optical character recognition (ocr) going on.
Amazon search (search inside the book) actually does a better job. Even a one page result could have the reference answer you want, but it is hit or miss
Google book search has limited access, and may require logging in. Even books that should be out of copyright may only provide a snippet, or no preview available at all. It may even give a reference that is not applicable at all.
Amazon will have some old books available for searching inside the book, depending on the publisher
Publishers’ sites: NAP.org had 3700 books available online for free, with capability to even put the book on your site (gives code to embed). These books are not available in book search sites
Cache mining: (most users do not use this)
searchengineshowdown site has list of archive sites where you can get old copies of a web page.
Wayback machine has from 1996 to 8-12 months ago (due to copyright issues. They have the material indexed, but not up. In the interim, if you are looking to compare versions of a site, you can use the cache features on search site.
Most caching does not include images - it will be text version of the page only.
Google number change:
The total number of results will include clustered results which are not included by default in the initial results. If you go to the end of the results and click on the link to show all search results, the numbers will actually change, so don’t trust the numbers. Another way to uncluster is to go into the search string in the address bar and add “&filter=0″ but it only works for smaller numbers
Supplemental Results (from supplemental database) - only worked for 200-800 results. Label is now gone, but unknown if it is still there.
Google has added more choices to date searches.
“&num=100″ (added to the search string in the address bar) will change result limit to 100 instead of 10
“&imgtype=face” (or news) will limit to faces (or news)
If you are a frequent phrase searcher on Google, be careful about how long the phrase is. Try breaking a long phrase into two shorter phrases.
Image searching issues - a lot of work has been done, and it still stinks. Unless someone puts words about the picture somewhere near the picture, people will not find the image. It all depends on what the image is named or words near the image
Link searching: Live used to be the best for this, but their commands were pulled in March due to abuse, but if you use “+” in front of the link search, it will still work.
“&pws=0″ will turn off personalized searching in Google (if you are logged in, you will automatically get personalized results)
Related searches:
Google suggest, yahoo drops down suggestions below the search box. ask.com puts them on the left
technorati tags:IL2007
Blogged with Flock