A. Web Developers FAQ
- What is a Google Search Appliance?
- How is the appliance different from the public Google search?
- Is every web page published at UQ indexed in the search appliance?
- How do I add a page to the Google’s index? How can I remove pages?
- How many documents are in the UQ GSA license?
- Will the appliance crawler follow the URLs found within non-html documents?
- What file formats can the appliance index?
- How are ranking and relevancy determined?
- Can the GSA index documents in Microsoft Sharepoint?
- Where I get more information on the Google Search Appliance?
1. How is the appliance different from the public Google search?
The Google Search Appliance (GSA) has some features which are not on Google.com: continuous crawling, Collections and Frontends management, customising Format, KeyMatches, Synonyms, and Filters of the search results. Also, the GSA can be authenticated to crawl and index private (intranet) web pages.
2. Can I find my website in the index?
Most sites in the .uq.edu.au domain should be in the index. Do a search of your site name in the search page and check if it returns any results. Addtionally, by entering in the search field ‘site:example.com’ and searching, the GSA will show you what is has indexed from your website. Remember to replace example.com with your website address in the example provided.
3. How can I add my pages into the index?
There is no need to submit your site for indexing. The search crawl should find any site that is linked from another site that has been already indexed. If the crawl has not found your site, or if you need it added right away, contact the Search Administrator and we can add it directly to the crawl list.
You should check:
- Page meta tags do not prevent search robots from indexing your site.
- Page can be reached from top level pages in UQ.
- Page is setup in the correct UQ domain and it is not excluded from the search collection.
4. How can I keep my pages out of the index?
Our user agent name is gsa-crawler. To use the user agent in your robots.txt file the syntax would be:
User-Agent: gsa-crawler
Disallow: /path/to-be/excluded/
If you do not want your page to be indexed, insert this tag within your page’s tag:
<head>tag: <head><meta name=”robots” content=”index, nofollow” /></head>
You can also prevent pages from being indexed by using a .htaccess file.
5. How can I remove a website from the index?
You can have a website removed by emailing the Search Administrator with your request and we will add it to our "Do Not Crawl" URL patterns list. The site will then be removed by the system the next time it begins a scheduled crawl. If the files should be removed right away please let us know. We can remove access to them even before they fall out of the index.
6. How can I hide section of a web page from the index?
You can do so using the googleoff/googleon tags. This is ideal for hiding navigational menu from being indexed.
For example, you can prevent the word “Home | About | Contact” from being indexed.
Home | About | Contact
7. How can I get my new or updated website indexed?
As the GSA is set to crawl continuously, it will find your site and update the index with the updated or new content. If you need to force the index to update your site, email the Search Administrator to request the update.
8. What is KeyMatch?
Administrator-defined keywords that promote specific web pages on a site. These keywords are associated with targeted URLs, so when search users type the keyword in the search box, they see the targeted URL displays above the main set of search results.
9. What is Synonyms?
Synonyms suggest alternate queries to users. Use related queries to define words or phrases that should be treated as equivalent when users search for them.
10. How can I optimise my website page for search visibility?
Go to the Creating a Google-friendly site for more information.


