Contenuto cancellato Contenuto aggiunto
Phe (discussione | contributi)
Riga 599:
Adesso mi ristudio bene il template di "marcatura" per cancellazione....--[[User:Alex brollo|Alex brollo]] ([[User talk:Alex brollo|disc.]]) 18:31, 13 feb 2013 (CET)
:: Il bot sta macinando ... ci metterà ore, ho impostato un intervallo di 60 secondi. Su Commons si chiama [[:commons:User:Alex brolloBot|Alex brolloBot]] perche Alebot non era libero. Se ti capita commenta, per favore, la mia richiesta di flag se non l'hai ancora fatto; un bot su Commons potrebbe ritornare utile (mi propongo di lavorare solo su argomenti it.source collelati, ovviamente). --[[User:Alex brollo|Alex brollo]] ([[User talk:Alex brollo|disc.]]) 21:13, 13 feb 2013 (CET)
 
== hOCR: possible use (diff) ==
I wrote a tool doing that but it doesn't need hocr, plain ocr text is sufficient, but hocr could be used too. The tool focus on words difference not on punctuation diff, focusing on punctuation difference produced too much false positive as punctuation is often wrong in the ocr layer (especially in french where quote use « »). I used it to catch diff between a M&S done with a different edition of the text and the scan. Here a diff before the text was corrected [http://fr.wikisource.org/w/index.php?title=Discussion_Livre:Nietzsche_-_Aurore.djvu/Diff%C3%A9rence&oldid=3815982], things are interesting from page 16, page 18 show a substitution of a word by another, the final diff is [[Discussion_Livre:Nietzsche_-_Aurore.djvu/Différence]], it shows only ocr error except the huge diff from page 425 which use complicate template to generate some table. The main problem with this tool are template: because you need to fixup template call to their expanded text in the tool, so the tool is a lot of wiki specific ;(. A tool getting the text from the html of Page: will have no trouble with template but I wanted to see diff of a whole book on a single page. For now I'm working on the hocr highlighter, I found a huge bug in the code I presented to you making highlighting accuracy almost fuzzy. [[User:Phe|Phe]] ([[User talk:Phe|disc.]]) 15:44, 16 feb 2013 (CET)