mercoledì 26 settembre 2012

Contenuto nascosto nei siti web e altro...

Due parole su pubblicazioni recenti del nostro lab (non sia mai che a qualcuno venga la voglia di fare una tesi su questi temi). Faccio un pò di cut-and-paste, spero che sia chiaro ed interessante...

A Look at Hidden Web Pages in Italian Public Administrations
Preventing illegitimate modifications to web sites offering a public service is a fundamental requirement of any e-government initiative. Unfortunately, attacks to web sites resulting in the creation of fraudulent content by hackers are ubiquitous. In this work we attempted to assess the ability of Italian public administrations to be in full control of the respective web sites. We examined several thousands sites, including all local governments and universities, and found that approximately 1.16% of the analyzed sites serves contents that admittedly is not supposed to be there.
This result is not very encouraging and somewhat surprising. To place this result in perspective, we observe that a state-of-the-art system recently developed for efficiently searching malicious web pages, manages to construct a stream of URLs in which 1.34% of them identify malicious pages and this system improves earlier strategies by one order of magnitude (cioè: trovare schifezze nell'1.16% delle pagine che analizzo è davvero tanto...)
...
Careful exploitation of these attacks may create a very odd scenario: pages hosted on a trusted site that serve content fully controlled by attackers, tailored to the navigation path followed by users, visible only to certain users. An analogy with the physical world may illustrate the issue more clearly: when entering into the building of a public administration, one would not expect to find offices that are not supposed to exist and are visible only to certain citizens, perhaps depending on where they come from. Unfortunately, this is exactly what it could happen in web sites of public administrations.
...
It is important to point out that HTTPS—the main and ubiquituos line of defense in sensitive web sites—does not provide any defense in this respect: The problem is, the server site is authenticated as a whole—any page coming from that site appears as being legitimate.

Brand-related Events Detection, Classification and Summarization on Twitter
The huge and ever increasing amount of text generated by Twitter users everyday embeds a wealth of information, in particular, about themes that become suddenly relevant to many users as well as about the sentiment polarity that users tend to associate with these themes.
In this paper, we exploit both these opportunities and propose a method for: (i) detecting novel popular themes, i.e. events, (ii) summarizing these events by means of a concise yet meaningful representation, and (iii) assessing the prevalent sentiment polarity associated with each event, i.e., positive vs. negative.
Our method is fully automatic. We validate our proposal on a real corpus of about 8,000,000 tweets, by detecting, classifying and summarizing events related to three wide topics associated with tech-related brands: Apple, Google and Microsoft



Posta un commento