What we do :
We design crawlers i.e. algortihms that collect data from the web. We use several methodologies like scraping, crawling and even hadcrafted crawling.
We collect index and clean heteregenous kind of data :
- text and semantic data,
- videos, …
We collect data from several communities of the web :
- Science Community : we use crawlers on large datasets as the Web of Knowledge Thomson Reuters, but also from site webs and Blogs that talk about Science issues.
- Social Networks : we use crawlers and scrapers methods in order to catch data from people talkink about issues in Social Networks.
- News Media : we use crawlers that extract knowledge from news around the world.
- Blogosphere : we crawl data from contents in Blogs.
- Traditional Site Webs.