Web-Mining : Extracting valuable data for development

What we do :

We design crawlers i.e. algortihms that collect data from the web. We use several methodologies like scraping, crawling and even hadcrafted crawling.

We collect index and clean heteregenous kind of data  :

  • text and semantic data,
  • metadata,
  • geolocalization,
  • images,
  • videos, …

We collect data from several communities of the web :

  • Science Community : we use crawlers on large datasets as the Web of Knowledge Thomson Reuters, but also from site webs and Blogs that talk about Science issues.
  • Social Networks : we use crawlers and scrapers methods in order to catch data from people talkink about issues in Social Networks.
  • News Media : we use crawlers that extract knowledge from news around the world.
  • Blogosphere : we crawl data from contents in Blogs.
  • Traditional Site Webs.