First one needs to install Python. See: http://www.python.org/download/
Python.org provides the files, download and installation instructions. Install and test per their direction.
Second one needs the Google Sitemap Generator Code.
See: http://sourceforge.net/projects/goog-sitemapgen/
Windows users get the ZIP file. Mac users... well, pick the file type you are most familiar using.
Also, for more information see: http://code.google.com/projects.html There are other interesting projects here too.
Once you have the Zip file, unzip it and find the files:
example_config.xml
sitemap_gen.py
Copy those to the web site's root folder, on the local workstation. Reference
You need to check the file sitemap_gen.py for a coding problem that may have been fixed by the time you read this. If it has not been corrected in your version you need to fix the problem to generate correct sitemaps. You can open the file with Dreamweaver if you add '.py' files to its list of editable files. Or just use the Notepad editor or whatever. Also, Python installs an editor so, you may be able to right click the file and open it in the Python editor. The '.py' file type is plain text so, any simple text editor will work.
Search for the word 'replace'. You should find a line with the text: middle.replace(os.sep, '/')
The correct code is: middle = middle.replace(os.sep, '/') the wrong code is just this: middle.replace(os.sep, '/')
Change the code if necessary and save the file. Use this file for any other copies you need for other sites.
Now the example_config.xml file needs to be edited. Save the file as a new file named config.xml.
In the config.xml file set up the site name and specify where the sitemap file is to be saved. This is done near line 30.
<site
base_url="http://www.cates-assoicates.net/"
store_into="sitemap.xml"
verbose="1"
>
The site name is the name Google will use to find the site. See Google's instructions for other details.
There are various options you can choose for how to build the sitemap by following Google's instructions. We suggest you comment out all methods except the Directory Nodes method for this use.
<!-- ** MODIFY or DELETE **
"directory" nodes tell the script to walk the file system
and include all files and directories in the Sitemap.
Required attributes:
path - path to begin walking from
url - URL equivalent of that path
Optional attributes:
default_file - name of the index or default file for directory URLs
-->
<directory path="F:\Inet\site\www\" url="http://www.someDomain.com/" />
The path needs to be the literal path to your folder containing the site you want to map. It needs to be the folder that contains your site's root files. These are the files on site with the URL's http://www.someDomain.com/pagename.htm.
The URL of course needs to be the site address, http://www.someDomain.com/.
Next, within the config.xml file you need to create a FILTER. The filter section is toward the end of the file. Filters can prevent the program from mapping files that should not be submitted to Google. We suggest Dreamweaver users add, at least, these lines:
<filter action="drop" type="wildcard" pattern="*/_mmServerScripts/*" />
<filter action="drop" type="wildcard" pattern="*/_mm/*" />
<filter action="drop" type="wildcard" pattern="*/_notes/*" />
<filter action="drop" type="wildcard" pattern="*/Connections/*" />
<filter action="drop" type="wildcard" pattern="*/Library/*" />
<filter action="drop" type="wildcard" pattern="*/Templates/*" />
<filter action="drop" type="wildcard" pattern="*/*.LCK" />
<filter action="drop" type="wildcard" pattern="*/*.mno" />
<filter action="drop" type="wildcard" pattern="*/TMP*.asp" />
We also recommend you add these sitemap filters:
<filter action="drop" type="wildcard" pattern="*/*.css" />
<filter action="drop" type="wildcard" pattern="*/*.eot" />
<filter action="drop" type="wildcard" pattern="*/*.js" />
<filter action="drop" type="wildcard" pattern="*/*.ico" />
<filter action="drop" type="wildcard" pattern="*/*.txt" />
Once you have your filters done, save the file and run the script in Test Mode then check the content of the output file sitemap.xml. You should see diretories, html, asp, aspx, and all the other files you want indexed. You may want to exclude images that generated by Fireworks™. If so we recommend you place them in a seperate folder, and filter the folder.
To run the script we often use a batch file with this code:
echo off
echo test ONLY mode
"C:\Program Files\Python24\python" sitemap_gen.py --config=config.xml --testing
The yellow text may need to change to represent the conditions on your workstation. The white text puts the process into text mode.
We also use a more complex script that reads whither or not to use the test mode from the command line.
Test Mode Usage: runIndex.bat --testing
Omit '--testing' to run for effect.
This mode notifies Google a new version of the sitemap has been generated. Be sure to upload the new file ASAP.
echo off
if "%1" == "--testing" echo Running in TEST MODE
if "%1" == "" echo LIVE MODE - Use runIndex.bat --testing
"C:\Program Files\Python24\python" sitemap_gen.py --config=config.xml %1
Run the batch file or Python command from the site's root folder on the workstation, which contains the config.xml and sitemap_gen.py files. This should generate the sitemap.xml file. Inspect it and adjust your filters if needed. Once you have the file generated as you want, upload it to the site.
Be sure to visit Google Sitemaps and register/submit your sitemap. |