User:Niyogi
From Wikipedia, the free encyclopedia
Contents |
[edit] magazine
- downloaded 615 (385 .com) raw content (bz2 format)
Next steps:
- build feature lists using new wikipedia lexicon
[edit] category
- have amazon and shopping for lexicon.txt
Next steps:
- need ebay; figure out soap/php interface to ebay and get
- rebuild cat maps
[edit] dmoz
- have 120K/174K front pages; 1link.csv has "key features" now
Next steps:
- build corpus of key features for each category in 1link.csv
[edit] ontok/ExtractAttributesfromText
- prototyped code, seen it work for "thinkpad laptops"
Next steps:
- test out search_by_product/brand on "600x ipod nano" etc.
- write search_by_model code
[edit] ontok/ExtractLocations
- use new city/state features to detect city/state combos quickly on "contact us" pages
[edit] ontok/wikipedia/products
- have wikipedia and product lexicon merged
Next steps:
foreach ($titlearr as $title) {
expand the associations on
productbrand: any product-brand combo appearing
brandmodel: anything that looks like a model (alphanumeric or 00 or short)
productfeature: any product-feature combo appearing
productunit: any product-unit mapping
}
foreach ($brandarr as $brand) {
// determine product associations
}
foreach ($brandmodel as $brand => $modelarr) {
foreach ($modelarr as $model => $n) {
// determine product associations
}
}
how to determine product associations
read in the productbrand table
read yhoo search response, google suggest reponse
detect "ma" features from output
for brand links, check the productbrand table
for brand-model links, check the productbrand table

