Google Office Hours 8-15-14

The following is a complete transcript of a Hangout on Air on August 15th with Google’s John Mueller.  It has been edited for readability.

Are Videos considered a Signal of Quality?

Does the Site:Search bring up ALL of the results for a site?

If there are pages excluded from the Site:search, does it mean these pages are penalized?

Are iframes a safe link building strategy?

When will the Structured Data Testing Tool be updated?

Will Google improve reporting on Schema.org data?

How should I treat multilingual pages for my domain?

How can we use canonical tags effectively for large catalog sites?

Does Google take into account which type of ssl cert is used?

Any trick to make Analytics filters for Europe?

Are Blogspot sites safer to promote for bloggers?

Gifs over Png’s or Jpg’s?

When can we expect the next Panda and Penguin updates?

Getting approved for Adsense

When will I see results from my Disavow File?

Recovering from a Spam Penalty

Do you recover gradually or immediately from a Panda 4 Penalty?

What’s more important – mobile or HTTPS?

Improving Adsense Revenue

Are Disavowed URL’s Taken into Consideration when your Submit a Reconsideration Request?

Images and Sitemap Pages in the Google Index

Duplicate Data in WebMaster Tools

Inconsistencies with Adwords and Webmaster Tools

Eliminating Links In Ranking Factor

Google Ranking for Price Comparison Sites

How Will PR Newswire’s Panda Hit Effect Sites that Feed from Them?

What to do if Your Website is Filtered by Safe Search

Domain Authority

Problems Matching Sites with  IP Locations

John Mueller: Ok. Welcome everyone to today’s Google Webmaster Central Office-hours hangout.  My name is John Mueller, I’m a webmaster trends analyst at Google in Switzerland and I try to connect webmaster, publishers like you all with our engineers and make sure that information is flowing freely in both directions. There are bunch of questions submitted already.   As always if one of you want to take the stage first and ask the first question feel free to go ahead.

Are Videos considered a Signal of Quality?

Speaker 1: Hey John, I have a quick question about videos in search results. A few months ago when you did a site search for my website and filtered by video results, it used to show all the pages that had Youtube embeds, but it doesn’t do that anymore.   I pasted the link in the Q&A.  It shows YouTube embeds on the blog which is in a different search domain, but it doesn’t show any of the pages that have embedded YouTube videos on the main website, so, is that because Google is not treating it as an embedded video? What happened?

John Mueller: I imagine that it’s just one of the usual changes that we make in search and sometimes we change the way that the snippets are shown, they way that the embedded content is shown directly in search results and for some sites you would see changes like that other sites will see changes in other direction so it’s actually just normal change in the way that we bubble up this information in the search results.

Usually you should still be able to see them if you go directly to the video search specifically, but essentially these types of snippets and other types of rich snippets are things that we don’t always shown as a search result, sometimes we’ll understand that they are there on these pages but that doesn’t mean that we think it always makes sense to show them directly in the search results.

Speaker 1: Right. But my question is not about video snippets. This is specific, if you take a look at the link that I shared. This is specifically for video results.

John Mueller: Ok.

Speaker 1: And so what I’ve noticed is that certain websites used to show the fact that there’s a YouTube embed as a video on the page, so, is this like a quality signal for the site that you don’t think is good quality enough, which is why even if there is a  YouTube embed we don’t see it as a video?

John Mueller: Did you submit a sitemap for those pages?

Speaker 1: No but we never did and Google automatically picked it up.

John Mueller: Yeah. That’s something I might consider doing. If the video is the primary part of the content of these pages, I’d definitely consider setting up a video sitemap.  It doesn’t have to be that the video is hosted on the site, it could be hosted in YouTube or Vimeo or whereelse. But it’s important to let us know about this connection and that it’s use data, that is the primary content of those pages.

Speaker 1: Ok. It’s not the primary content. It’s just embedded because it’s relevant to the content on the page.

John Mueller: Yeah. I guess in those situations is something that our algorithms will have to make a decision about whether or not it makes sense to show that as a video search result, because if people are supposed to be looking for videos and this is just let’s say an extreme case a random video that’s included somewhere on the bottom of the page, then that doesn’t seem so relevant for the user in a case like that. So that’s something where I imagine our algorithms are just trying to figure out how relevant those pages are for people explicitly searching for videos.

Speaker 1: Ok.

John Mueller:   Let’s start with the Q&A.

Does the Site:Search bring up ALL of the results for a site?

Speaker 2: John. Sorry. Can I ask a follow up to that. Because I think the question that he was asking it’s because he is using a site search. It  should bring absolutely everything with a video on that site or supposedly.  Does that mean that given the entry asked, that if you used site colon, it won’t bring up everything, only the important stuff?   I thought that the operator was first to bring in everything.

John Mueller: Sites queries have always been a bit of special case. I don’t know especially how the other channels such as news and image search handles that, but  within web search, one of the things there is that we see this as a restrict we don’t see it as a way of saying “you want everything”. Essentially what you’re saying with the site colon query in web search is: restrict to only the results from this site. So that’s why in some cases when you do a site query you see completely almost arbitrary numbers on top like  50 million pages and you know you only have 5,000 pages that’s because we see this as a restrict and we see especially the counts as something that we optimize more for speed than for accuracy. So that’s something where I would use the site colon query for specific URL’s if you are looking for something specific to see if it’s indexed, but I wouldn’t use it as a way to determine which pages are actually indexed because there are lots of reasons why you might filter them out for a site query, why we might change the counts a little bit for sites queries, it’s definitely not something I’d use in general for diagnostics.

Speaker 2: Right. Does that follow-up question help?

If there are pages excluded from the Site:search, does it mean these pages are penalized?

Speaker 1: No. It’s a good question. But the things is, you know, there are hundreds of pages that used to show up in that query and it’s making me think that somehow Google has put this black mark on the site saying we are not going to show any videos on this domain because you maybe got hit by Panda or whatever it is.  But I’ve seen this but in other sites that I think got hit by Panda. But other sites that did not get hit by Panda like Wenger Video or whatever. They have long articles that also have embedded video and those video show up. But my site has long articles and embedded videos were a specific site colon search for  videos doesn’t show up.

John Mueller: I wouldn’t necessarily use that as a signal that you’re somehow demoted by one of our algorithms. I think this is really just our algorithms trying to figure out what is the most relevant there for a query like that and especially when you mention that these are videos also on the page together with a lot of text and I imagine that’s a hard situation for our algorithms to determine how specific these pages are for video search.

But I will take a look with the team to see if there’s something more specific that I can pass on there. But I don’t see anything, let’s see, technically broken in that regard.  It’s essentially the way that our algorithms are classifying those pages on your site and to some extent both variations could make sense.

Speaker 1: Ok. Thank you.

Are iframes a safe link building strategy?

John Mueller: Ok. Q: A lot of tube sites publishing adult or non adult content around are growing their link popularity thanks to the iframes they give around. Is this a correct link building strategy ? they put a “powered by” link under the iframe . In the future are they at risk?

John Mueller. A: So I’d love to see some examples of this.  But in general if this is something that the webmaster has a choice on when they put this on their site that’s usually fine. On the other hand if this is something that is specifically tied to these embeds and that could be a bit problematic in the sense that the webmaster might not have a choice to actually remove this type of link or they might not even see this.  But I definitely would love to see some more examples before really saying that this is okay or this is not okay.  Because I can see this going in both directions.

John Mueller: Yes.

Speaker 3: This was my question.  I asked  because I work in this industry also and it was one of my curiosities for a long time.  I’m not sure if this is a correct strategy or will be problematic in the future.

John Mueller: If you have some examples.  I’d love to take a look with the Website team and maybe we can give you a more specific answer next time.

Speaker 3:  It’s on all the big websites.  Everybody does it.

John Mueller: If you could send me some examples or pages that are embedding  this, I would love to pass this on directly.

When will the Structured Data Testing Tool be updated?

John Mueller: Sure.  Ok. Q: There are many types and properties the Structured Data Testing Tool doesn’t report about or reports about falsely. Do you have any indication if and when the Structure Data Testing Tool will be updated?

John Mueller. A:  I know they’re working on updates there so I think at some point you’ll see some changes.  I don’t have any specific timeline for when those changes will be active.  What kind of problems are you specifically seeing there?

Speaker 4:   Well one of the problems with the structured data tool at this moment is there are tons of properties that it reports about that are not part of a schema, yet when we go to schema.org they are part of certain entities or at the same time also certain combinations of types and multi-type entities that the structured data testing tool that it skips the second entity and it’s starting to become close to impossible to go beyond anything with rich snippets.  Schema.org is growing and growing and growing but Google’s testing tool isn’t growing with you.

John Mueller: Ok. So the testing tool isn’t in sync with the changes that are happening on schema.org ?

Speaker 4:  Exactly.  There are a lot of types and properties that the structured data tool  doesn’t recognize.

John Mueller: Ok. I can see if we can do something there  to maybe do an update in the meantime, but I know the team is working on bigger changes there. So maybe they just want to focus on those bigger changes and make sure that when that’s updated it kind of keeps up automatically.  But I can definitely pass on that feedback to the teams so that they can see if there’s something shorter-term that they can do to help out.

Will Google improve reporting on Schema.org data?

Speaker 4: Last time I spoke to you, we spoke about the ROI of the schema.org but most of all, Google showing more information, what it does with the entities you mark up, and you said last time that you were taking back to the team, is there any chance of any movement there, there’s going to be more reporting about it?

John Mueller: It’s hard, so I don’t know of anything short-term that’s coming there to have more reporting for the structured data site.  I know the testing tool is one thing that they’re working on revamping there but I’m not aware of anything big that’s shortly before we launch at the moment so don’t have anything interesting I can share with you there.

Speaker 4: Unfortunately.

John Mueller: Yeah, sorry. But I know there’s something that the teams here are very keen on and they are very keen on also making it more interesting for the webmaster that also provides this kind of information.

Speaker 4: Thank you.  I will keep waiting then.

How should I treat multilingual pages for my domain?

John Mueller: Yes. Sorry.  Let me see. Q: We have a multilingual page example.com and redirect recognizing browser settings to example.com/en or /de or /br … Should we use 301 or 302 redirects? Or shall we use main language for example.com and use redirects for the search pages?

John Mueller. A: We have this documented on  the general information page for multilingual pages. We also recently did a blog post on how to handle home pages.  Essentially if you have a redirecting home page so the normal example.com page that’s something you can use the 301 or 302 redirect there, that’s not necessarily a problem on our side. What we’d like to see however is that the href lang is marked up properly so that the home page is set up as default and that the lower  level pages like /br, /de are all accessible directly.

Essentially we need to be able to access the individual pages, but from the home page you can redirect if you think that you can do that in a smart way and just let us know that this is the default version of the page.

http://googlewebmastercentral.blogspot.com/2013/04/x-default-hreflang-for-international-pages.html

 

How can we use canonical tags effectively for large catalog sites?

John Mueller. Q: On a catalog that is paginated and with faceted navigation, we use rel previous and next on pages, but for filtered subsets, what’s the right canonicalization strategy for subset pages that are crawlable on unique URLs? For example the intersection of pagination and rel canonical? One should presumably not canonicalize page one of “red shirts” faceted navigation to page 1 of “all shirts” results. Canonicalization is not obvious here! This problem occurs on many large catalog sites.

John Mueller. A: So we have two fairly detailed blog posts on pagination and faceted navigation on the Webmaster Central blog.  I’d definitely check those out because there are a lot of subtle things there that you’d probably want to watch out for.

For example, to some extent faceted navigation is ok for us to crawl and index. But sometimes there are elements of faceted navigation that result in problems. So it could be that you end up on a page that is actually more like a 404 page.  Like if you search for Red Shirts in the color blue, maybe that’s something that you could pick out on faceted navigation, but of course it would lead to no results.  That’s something where you want to watch out for those kind of issues.  All of the details are on these blogs posts and I’d definitely go through those because there are things that are very specific to some sites that don’t make a lot of sense for other sites to watch out for and it’s probably not that helpful if I go through all that in general here.

http://googlewebmastercentral.blogspot.com/2014/02/faceted-navigation-best-and-5-of-worst.html

http://googlewebmastercentral.blogspot.com/2013/04/5-common-mistakes-with-relcanonical.html

http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html

 

Does Google take into account which type of ssl cert is used?

John Mueller. Q: SSL: Does Google take into account which type of ssl cert is used? For example self signed, domain or organisation validation. What about free ssl certificates? Are they any good? Is there any weight given according to the new soft ranking signal?

John Mueller. A: We don’t take it into a count what type of SSL certificate is, but it has to be a valid certificate that the browsers accept.  If this is something that you self sign, then probably  that’s not that useful because browsers will flag that as something problematic.  You can use a free certificate if you want, there’s some providers of free certificates, for example for non-profits or open source software. One thing to watch out with free certificates is that sometimes providers give the certificates for free and then charge for updates.  If you need something that is fairly low cost maybe you have to watch out for the recurring cost as well.

From our point of view it’s important that the certificate has 20-48 or more bits keys.  But that’s not something that we are currently taking into count for the ranking signal.  So for us at the moment if it’s indexed as HTTPs and it has a valid certificate then that’s enough for us to trigger this slight ranking signal that we have.

Any trick to make Analytics filters for Europe?

John Mueller. Q: Any trick to make Analytics filters for Europe? I’d like to make a property for european sales team and filter for the whole continent. But property filter neither has continent nor enough letters to add all the european countries.

John Mueller. A: That’s something you’d probably want to talk with the Analytic team about.  One aspect that I could recommend is making sure that your site is split up into those sections that you want to track to make it a little bit easier.  So if you want to track this in webmaster tools for example, then one idea might be to use subdirectories or subdomains and to verify those separately in  Webmaster Tools  so that you’d have that information separately within Webmaster tool as well. I imagine for Analytics you could do something similar.  But maybe there are other tricks that you can use in Analytics setup so I’d definitely check with the Analytics team on that.

Are Blogspot sites safer to promote for bloggers?

John Mueller. Q: After all the changes that have been going on these last couple of years are most blog type websites better off using blogger? Do the same guidelines that apply to sites also apply to blogspot sites?

John Mueller. A: Yes.  We treat blogspot sites the same as any other websites.  It’s essentially a website.  You don’t have to have a traditional blog like you always publish content, like daily snippets or whatever.  You can use it for any traditional website. I’ve seen it used for restaurant websites where they published their menus.  I’ve seen it used for all kind of others websites as well. Using blogger is fine, using other platform is also fine.  It’s not that Blogger has any inherent advantage on our side when it comes to search.  It’s just the platform that works as well as many others.

Gifs over Png’s or Jpg’s?

John Mueller. Q: Are GIF pictures good for website and SEO instead of PNG or JPG?

John Mueller. A: You can use any of these.  If it’s a supported the image format, we can pick it up for image search and use it there, that’s essentially up to you.

When can we expect the next Panda and Penguin updates?

John Mueller: Q: When can we expect the next Panda and Penguin updates?

John Mueller: A: At the moment we don’t have anything to announce. I believe Panda is a lot more regular now.  So that’s probably happening fairly regularly. Penguin is one I know that the engineers are working on.  It’s been quite a while now so I imagine it’s not going to be that far away, but also it’s not happening this morning. Since we don’t pre-announce these kind of things it’s not something I can give you a date of.

John Mueller: Q: I have some keyword ranking in 1st position according to webmaster tools. Some keywords are in 3, 5, 9th positions, but the problem is clickthrough is too low for them. How i can increase clickthrough rate??

John Mueller: A: That’s always something that I guess everyone wants to know.  How can I get the people to click on my site in the search results?  There’s no real magical answer.  There’s no technical solutions for that essentially.  I imagine there are two aspect there. On the one hand the pages that are ranking there should be relevant to the user.  It should be something that the user finds matches their query, their intent, why they’re searching.  So that’s something you can double check on your side, take those keywords that these pages are ranking for and think about whether or not  this is relevant for the user.  And If they are not relevant for the user then maybe the clickthrough rate isn’t something that you should be looking at there.  If they are relevant for the user then maybe the user is confused with regards to what kind of content is on these pages.  So one you could think about there is whether or not you might want to try a different title.  Maybe test different titles for these types of pages.  Think about the Meta description you have on these pages.  If this is something that makes it really clear to the user in the snippets what this page is about and if this is something that maybe encourages the users to click through if it is something that they care about.  So those are the kinds of things you can look at there.  There’s no technical solution to this question.  This is essentially a matter of your site showing up in the search results and the user recognizing that this is actually what they were looking for.

Getting approved for Adsense

John Mueller: Q: Do you think there’s a way for Google adsense publishers to get approved quickly?

John Mueller: A: I don’t know what the process is for the Ads side of things.  I’m sure that there are publishers want to get approved quickly.  But I’m also sure that our advertisers want to make sure that these publishers are vetted accordingly and the right ones are approved.  But I don’t really know how the Adsense inside handles these kind of things. So I’ll check if you think that there are problems there in the Adsense help forum and give them the information they need to kind of look at your site and double check. Maybe there are other things that appear when they look at your site and say “Oh this is something that should be fixed before you apply for Adsense” or this is something that makes your site look really bad and those are sometimes important things to hear even if you don’t want to hear them the first time so.  That’s the kind of feedback I’d try to get there.

When will I see results from my Disavow File?

John Mueller: Q: I’ve uploaded multiple disavow files but have seen no changes in the rankings. When will they kick in?

John Mueller: A: These disavow files, if they’re technically correct, then they essentially get processed automatically and the next time we crawl the URL that you have mentioned  in the disavow file we will drop the link from those pages to your site.  So that’s essentially a technical element that  happens ongoing and automatically and there doesn’t necessarily means that you would see immediate changes in rankings even when they do kick in.  Sometimes it can even be that a site was partially supported by problematics links and if you disavow them, if you remove those links then that support is also missing and maybe the site could even go down a little bit during the time when things kind of stabilize.  So that’s something where these disavow files are processed automatically, they’re processed ongoing and taking into account as we recrawl things so you would  generally see that in effect there but it’s not something where you’ll see an immediate effect on your rankings.

Recovering from a Spam Penalty

John Mueller: Q: What needs to be done if my site is taken down for spam for no reason and if there was a real reason, how can I find out what to do?

John Mueller: A: I imagine this is in the Manual Actions section in Webmaster Tools, so that’s usually where you’d see this information.  If it’s taken down for example for pure spams reasons usually it’s pretty clear what was happening there.  A lot of times we’ll see sites that just aggregate content, rewrite or spin content and these are things that we tend to take down.

If there’s no value in this site being shown in the search results and sometimes even crawling or indexing this content, then that’s something we might take down.  That’s the kind of things I watch out for.  If you really don’t see what you might have done that is so problematic I’d definitely take it to the help forum and get some hard feedback from your peers to see what they find there and maybe there is something that is really problematic that you weren’t  thinking about when you created that website.

Do you recover gradually or immediately from a Panda 4 Penalty?

John Mueller: Q: Should I expect the rise in traffic for a site that made a partial recovery after the Panda 4 release to be one-off or is it normal to see the visitor count rise gradually/continuously ever since the Panda 4 release even though the site did not change?

John Mueller: A: Usually if it’s a good website you would see continuous changes like that.  So that’s something where if you made significant changes and an algorithms updates has happened in the meaning time, then you will see on the one hand a step when the update is happening so the site kind of changing, changing the rankings, changing in visibility in the search results when the update happens.  But it can be normal that you also see like a continues or gradual rise over time as well.  And that isn’t necessarily tied to the algorithm update but it might just be that things are working out for the site and users like the site better and things are just generally trending up.

Speaker 4: Right.  In reality that site didn’t change.  It was already suffering from Panda for like two years and we were just getting into another round of investigations to see what we could improve to that site to see if we could skip from the Panda situation.  Then the Panda 4 update came and the site besides of the big step up, we suddenly saw the site also, let’s say like 10% per week, started to grow.  But we didn’t change anything right at that moment still.  So that’s where the question came from, because if it’s step up I can understand but gradually incline, where did that come from? After being frozen, flatlined, for two years.

John Mueller: That’s weird.  I’m not aware of any of our algorithms being such that they gradually ramp up like that.  So I wouldn’t necessarily say that that’s something from Panda or something specific like that.  If you want I can take a quick look at the site afterwards.  But I think that doesn’t sounds like anything specific to one of our algorithms.  Say “no, we will try 10% this week and 10% more next week and next week we will try another 10%” That seems more like a natural progression in search.  Maybe it’s also something that takes into account what else is happening for those search results.  Maybe the other sites in the search results are doing worse than they were before.  It’s really hard to tell.  But I’m having to take a quick look.

What’s more important – mobile or HTTPS?

John Mueller: Q: MOBILE or HTTPS, what should I work on first?

John Mueller: A: I think you’ll probably see the bigger impact if you make things mobile friendly first.  That’s something where users at the moment notices this very well, and there are lots of users that use Mobile devices as the primary internet device.  So if you have a website that doesn’t work well on mobile you’d definitely see a fairly  noticeable change in user activity at least if you do make it mobile friendly.

That’s something where, at the moment, if we notice that there are real problems with the website on mobile we will take that into account.  In the future it might be that we also take into account on how friendly it is general for websites like that.  But that’s something where at the moment there is an extreme change in how people are accessing the web and it’s really going towards mobile.  There’s a really strong push for users who are trying to do as much as possible on their mobile phones and that’s something where, if your site doesn’t work currently at all on mobile, then you will almost certainly see a visible change in how users are interacting with the site and how people are basically taking your site and converting in the way that you want them to convert.

HTTPS is something that I imagine the long run will become more and more important, but it’s not something where if you change to HTTPS that users will automatically notice and say “Oh this is fantastic, this is just what I’ve been waiting for.” It’s a little bit different because on the mobile side because that’s something where users are actively trying to reach these websites already.

So from my point of view if you have to make a decision between these two, I’d definitely work on mobile first, if you have a chance to revamp your website to work on mobile maybe you can at the same time include HTTPS  as a forward thinking idea in the sense that you’d take care of it now instead of having to take care of it later.  But I’d definitely focus on mobile first.

Improving Adsense Revenue

John Mueller: Q: What are your suggestions for optimizing site revenue from Adsense?

John Mueller: A: I don’t really have any suggestions there.  I can’t really speak for the Adsense side of things.  We keep things completely separate.  So I don’t have anything I can add there.

Are Disavowed URL’s Taken into Consideration when your Submit a Reconsideration Request?

John Mueller: Q:  In a recent reconsideration request for a manual action for links, the sample URL’s in my denial letter were URL’s I had listed in my disavow file. Any idea why this may have happened?

John Mueller: A: I probably would need the URLs to kind of see what specifically happened there.  The two aspect that could have happened here, on the one hand maybe the disavow file was formatted in a way that we can’t really read completely, so maybe there’s something in the URLs that you have listed there that doesn’t quite 100% match the URLs that were included in the denial letter for the reconsideration request.  That could be for example if you list individual URls in your disavow files and there are other URLs on this website that almost look the same but are slightly different URLs.  So in cases like that I just make sure that you are using the domain directive in the disavow files to make sure you are covering everything from the site.

Another thing that  could reasonably have happened is that maybe there is a timing issue there and when you submitted your disavow file someone was processing your reconsideration request already, theoretically that’s thinkable.

Finally it’s also possible that some mistake happened on our side that someone processed the reconsideration request and for whatever reason accidentally included the disavow links as well in there and didn’t realize that you had already submitted those.

So if you’re sure that technically your file is correct if yours is submitted with the domains directives to make sure that you’re kind of catching all of this URLs that are there and you’re sure that there couldn’t have been any kind of timing issue there then I’d definitely submit another reconsideration request following up on yours and say “hey these links were already disavowed, are you sure that this is still the problem?”

The other thing worth mentioning in here is usually the reconsideration request team, when they process these files, they don’t take into account just those three or five URLs, whatever is listed as a sample.  They really look at the overall picture for your site and if the overall picture for your site is still bad then it’s doesn’t necessarily mean that you need to focus on taking those individuals URLs out, it’s really a sign that you need to work on the overall picture first and clean all that data, not just this individual two or five URLs.  So that’s generally what I would recommend there.

Speaker 2: I thought before you’d said in response to a similar question that the sample URL’s that you might receive back are exactly that – samples of the type of URLs that might be or could have caused the problem, they’re not actual URL’s that you need to go away and fix.  You probably should, but  you should use those as a guide to “these are the type of sites or URL’s we don’t like, go and fix all of them like that, not just fix these three, and that will stop the blockage.

John Mueller: Exactly. Yeah. But at least the sample URL’s that we specified should be relevant samples, they shouldn’t be things that you have take care of already. From my point of view, if we send back URLs that we can see you’ve taken care of already, even if they’re representative for the type of problem, then that’s not really helpful. So we essentially try to avoid that situation if we can but as we said it is really more of a sign that there is a general problem that’s still out there and the person who was processing the files maybe made bad choices for the sample of URLs.

Images and Sitemap Pages in the Google Index

John Mueller: Q: Is Google planning to index lazy loaded images using data-url / data-src? That’s not working right now.

A: At the moment we don’t support that for image search.  I know that’s something that team has been looking at but I don’t know what’s the timeline is on that.  So I don’t really have anything specific that I can add there.

Q: Why are only a small number of my submitted Sitemap Pages Indexed, yet most of my webpages show in the Google Index?

A: This a fairly common question this is something that we see from time to time.  Essentially what’s worth keeping in mind here is that for the sitemap index account, we take into account the exact URLs that you have specified in your sitemap file.  So for example if you submit your sitemap files without www version of your site and your site in generally index with the www version then even though those URLs lead to the same content, we won’t count those as being indexed.  So you look at the sitemap index account maybe you’ll see a really low number there just because it doesn’t match one-to-one exactly what you have indexed in the website.

What I’d recommend doing there is: making sure that you’re as consistent as possible within your website, that you have a clear preferred domain setting that you would use www/non-www, that you use the exact same URL structure within your website so when we crawl your website we find the exactl URLs as you have specified in your sitemaps files, and that your sitemap files doesn’t include URLs that we don’t index like that.

Sometimes, for example, we’ll see the website internally linking with a rewritten URL and the sitemap files has parameterized URLs  that are actually rewritten on the server when we recrawl them. So those are the kinds of things where if the URL doesn’t match exactly what we have indexed for your website we won’t count them as being indexed for that sitemap file. You will still see the actual index count in the index status feature in Webmaster tools.  It’s just within the sitemaps feature where you won’t see that this content is indexed.  We will just focus on the URLs not on the general content.

Speaker 1: John, can I ask you another indexing question? We have a site map that has HTML pages which are kind of like Wikipedia images pages, so the URL’s for those pages end in a .jpg or .png, but these are actually HTML pages, and what I found is that google was not able to index these pages. Is it because you expect a .jpg URL to be an image? Even though it was empty?

John Mueller: Yeah. So, to some extent, we can recognize that HTML URL´s that end with .jpg for example that are also HTML but what probably happens is we primarily crawl them with our image search crawler and it will tell us “Hey, this isn’t an image, I can’t put it in the image search” and we won’t even really try it with our normal Googlebot crawer, so if you can avoid the misleading file type things there, that probably makes it a lot easier for us to actually crawl those pages.

Speaker 1: It’s a Media Wiki setting so Wikipedia has the same issue but of course you have an exception for them.

John Mueller: I don’t think we have much as an exception for them but maybe we just learned it better for their website already because we have more information about how that crawl is.  If you can avoid doing that, I think that would make it a lot easier for us to actually crawl those pages.

Speaker: Would a trailing slash  help so the URL ends with a .jpg and a trailing slash?

John Mueller: I m pretty sure that would be fine.

Duplicate Data in Webmaster Tools

John Mueller: Q: In Webmaster Tools there are pages on my site with duplicate meta descriptions. These are pages within the same category. For example: “mountain p=1” and “mountain p=2”. How do I overcome this problem? Also, href link tags are on my bilingual sites. Why the duplicate title tags then?

John Mueller: A: Essentially we give this information in webmaster tool as a guide for potential issues on the web site.  We are not saying that this is causing any problems in the crawling, indexing, or ranking, but during our normal crawling we notice that these pages have the same description or the same title.

So that’s why we bubble that up in webmaster tools.  We don’t make any kind of a judgment call on that, we don’t take into account other aspects on those pages, so maybe they even have a rel=canonical or, like you mentioned, maybe they have href link tag to say that these are essentially different variations of the same content. We don’t take that into account for the Webmaster Tools HTML suggestions there. So, that’s on a fairly low level in Webmaster Tools. We bring that as a suggestion there but it’s not a sign that this is causing any problems.

 

Inconsistencies with Adwords and Webmaster Tools

John Mueller: Q: The Adwords team tells me I violated Webmaster Guidelines and asks me to file a reconsideration request. However Webmaster Tools says “No manual actions” and won’t let me? What can I do?

John Mueller: A: I passed this ahead on to the team to take a look at, and I think that they resolved that already. But, in general, what might have happened in any case like this is something really old got stuck somewhere and just needed someone to take a quick look at it so, in a case like this, escalating back to us to take a quick look is always a possibility.

 

Eliminating Links as a Ranking Factor

John Mueller: Q: Is there a model of Google being put together that eliminates the use of links as a ranking factor? If so, do you have a projected date for that release?

John Mueller: A: We tend not to pre-announce these kinds of things. I wouldn’t have any date for that, and wouldn’t be able to really say that we are doing this or not. But I believe the ranking teams do take into account this kind of issue and think about what they can do to move away from links and move to the next bigger, more important ranking factors, and as we find those kinds of ranking factors and as we can double check to make sure that they actually work really well I’m sure the team will be looking into taking that step.  It’s not the case that we’re holding onto links arbitrarily.  It’s just that from our point of view they still make quite a lot of sense to use for some amount of ranking.

 

Google Ranking for Price Comparison Sites

John Mueller: Q: Are price comparison sites considered low quality sites by Google, and what’s your recommendation on improving keyword rankings for price comparison sites?

John Mueller: A: So, from our point of view there are definitely some variations of these sites that would be considered low quality, that would be considered aggregated content from various other sites that have no unique value or no unique information or content of their own.  So there is definitely an aspect there where if you’re just aggregating content from other sites and showing it on your pages that doesn’t necessarily make your site something useful and compelling on its own.

So, we really recommend, as with any other type of affiliate sites, that you have something really unique and compelling of your own in your website that we can say, “If someone is looking for a specific product or a specific type of service, then this page has something unique that nobody else has.”  If all you’re doing is aggregating feeds from others providers and showing them next to each other then maybe that’s not really as compelling as it might look.

So, that’s something I’d try to take into account, be it a price comparison site or be it an affiliate based site, or be it  any other type of site that you really need to have something high quality of your own on your website that gives us reasons to send visitors to your site and not to any of the others sites that also process these feeds.

 

How Will PR Newswire’s Panda Hit Affect Sites that Feed from them?

John Mueller: Q: We are a new site and we get some of our content through syndication to Press Release Newswire’s site. PR Newswire seems to have been hit by Panda.  How will this affect sites that are displaying their news? We have an RSS feed from them.

John Mueller: A: Similar to the previous question.  If you’re just aggregating content from feeds and not providing any value on your own then that’s not really so compelling for our users.  It’s definitely not so compelling for our algorithms.  It’s not something where we say “there is something really unique that we’d like to show up here” and when I talk to sites in the forum and I bring those issues up with the engineers, our engineers generally come back and say “well if they’re doing the same as all as these other sites they’re just aggregating the feeds and not providing anything additional value, why should we show them at all in the search results?  We already have enough other sites that are doing exactly the same, why should we even include them in the search results?”

So, that’s something where you’d want to take a step back and look at your website and think about what you can do that is significantly better than everything else out there.  And that’s the kind of content where, if we take that to the engineers and say “Hey look at this site.  They’re aggregating content from feeds but they’re doing this fantastic thing here on the side, that’s like nothing else we have in the search results, that provides lots of unique value to our users.  We should be doing better and show them in the search results.”  And generally our engineers take that feedback very seriously and think about what they can do for the long run for sites like that.

But if your primary content is just aggregated from somewhere else and there’s lots of sites that are doing exactly the same thing, our engineers are going to say “well we already have this content in our search results.  We don’t need to add it again.

 

What to do if Your Website is Filtered by Safe Search

John Mueller: Q: Hi John, My website is being filtered on Google Safe Search. The website is now clean for any Adult material. I think it was the adverts. How long does it take to be re-included?

John Mueller: A: We have a forum in our Help Center for review for safesearch sites. So that is what I’d submit there. If this is something that we pick up algorithmically then generally what you need to do is do to let us recrawl and reprocess the pages and depending on the type of site that can take, anywhere from a few weeks to a few months or even longer, so that’s something that can it take a bit of time.  So I’d definitely make sure that you have everything cover there. If you’re saying that some of your advertisements were adult content then I’d just double check their content as well to make sure that those advertisements are not targeting something specific on your site.  So really make sure that your website on a whole isn’t something that might be considered adult in any way.

Domain Authority

John Mueller: Q: Hi John, do you believe in Domain Authority?

John Mueller: A: Hard to say. I mean. I don’t really know what specifically you’re looking at there, but we do have some algorithms that look at websites on a domain or site level that try to understand in general how good or how bad is this site as a whole.  And, in a way, you can see that as domain authority, for example our high quality sites algorithm looks at the website overall and tries to make a judgment call on how high/low quality your content is there.  That helps us a lot when we see new pages from this websites because we can categorize and say “hey overall there was really great content on the site so new content, we don’t really know so much about, it’s probably going to be good as well.” So in that regard, that’s something that could be seen on as a domain or a site level.

Problems Matching Sites with  IP Locations

John Mueller:  Ah let’s see, Lot’s of questions there. Let me try to take something more general. Q: I am showing deals on the basis of IP address so they are dynamic and when I use fetch as google it doesn’t show them because  of different locations?

John Mueller: A: Essentially this is kind of like cloaking in the sense that Googlebots see something different then your users will see, so that’s something to watch out for. The other thing to watch out for is Googlebot generally crawls from the US and if you showed US users specific content then that’s what we’re going to index. So if you have dynamic content like that, you just have to take it into account on that we’ll be indexing one version of that content and not all the other ones. If you have something specific for individual locations, I use something like href lang to let us know about that so we can crawl and index these variations separately.

Speaker 2: John. I have a question in regards to that.

John Mueller: Ok.

Speaker 2: Our site is a US site, but it also has activities from all over the US, so balloon rides in California and wine testing in New York.  And we used to deliver content based on IP because if you land on a New York GIFs page, you want to see activities in New York and the same for California.

We found that we suffer from the same problem that we were being spidered where the New Yorker’s page was being spidered, but you were seeing California content for it. So if you’ve got to different countries I understand being spidered from the US is a big problem delivering IP based content. What if you’ve got a US site and you are delivering IP based content? Shouldn’t you guys be able to pick up that not everyone lives or is based in California?

John Mueller: It’s tricky because for the most part I believe our IP addresses are just based in California.

Speaker 2: Right.  That’s what we saw, because each page might have specific stuff but then it would have recommended or closest for you, but you would see everyone was close to California.  Then it would look like we are delivering the same results to everyone and having duplicate content issues.

John Mueller: Yeah. So then, if someone goes to your New York site, they’d see New York content regardless of where they’re located?

Speaker 2: For some of it yeah.  There would be other related products, this results that you look at but lets say you’re looking at a Balloon Ride but here’s other stuff for you and it would show the closest stuff.  Because of the way our site is broken down, they wouldn’t necessarily land on a New York GIFs page but they might land on a hot air ballooning page.

So, we show all the hot air balloons we’ve got across the country, 50 of them, but we’d deliver it via IP so everyone would see different, apart from Google, which would see all California,  but our whole site based in California basically and New York took the big hit.

John Mueller: Yeah. I don’t know what the best solution there is.  This is something where our IP addresses are primarily based in California and I think that’s regardless of which data center we crawl from.

And the other part is that we generally have one copy of the content for the URL in our index, so we wouldn’t — even if we saw like the New York content for this specific URL, we wouldn’t necessarily be able to differentiate that, if it’s the exact URL.  So that’s kind of a tricky thing there. What I’d watch out for is to see as much as possible if you can like serve general content on these pages and personalize that as well.  That’s a great thing to do, but the personalized content — if that’s not the primary piece of content on these pages that makes it a lot easier because then we could focus on that primary content,  say “Ok this is a general page about balloon rides.  There’s a lot of balloon information here.  There’s various events on this page but it’s not only focus on one location.  But it’s something-

Speaker 2: Cuz we were worried about cloaking. I don’t want to cloak, because i don’t want to get banned, but for my users is surely better to show a balloon ride that is in 3,000 miles away. So I want to show them that.

John Mueller: Yeah.

Speaker 2: So does it become a business decision rather than an algorithmic or content decision, indexing decision?

John Mueller: I see it more as a business decision because, like I said, from our point of view we take one URL and we assume that the copy of the content that we got through crawling is representative for this URL.  We don’t assume that the maybe if we crawl from different locations we’d see different content there.  So especially within  the same country, we wouldn’t even know that there might be New York-based content on this general page if we never see that when we recrawl from California..

Speaker 2: Right.   Would it be better maybe to treat it in similar way to a lot of tablets and mobile sites?  I know they deliver the same page but then it has a “show me your location” or “share your location with us” which would then deliver the content.  Like most — you know if you browse on a Mobile or a tablet now it will go “Will you share your location?”

John Mueller: Yeah. That might be a possibility, or might also make sense to split these up to pages per region, for example.

Speaker 2: AHref State. AHref Metro… introduce those.

John Mueller: Well I mean. It depends a lot on your website and what kind of content you want to show there..

Speaker 2: I’d assume Yelp, and those sort of people have similar issues

John Mueller: Yeah. I’ve seen that the, I believe Yelp and Craigslist they regularly try different things so this is something where I think there’s no one solutions that fits everyone so I’d see how they’re handling this thing. You can usually check by just looking at the cached page and see which version ends up being indexed from their site.

From our point of view, it’s not something where we’d say this is a web spam issue that you’re cloaking to us or being a spammer.  It’s essentially more of the user side issue, where if you’re looking for balloon rides in New York and all you find is balloon rides in California, because that’s what we indexed, then that can be confusing to the user.  That might not be optimal from our point of view or from your point of view, but that’s something that you can generally control by giving us separate URLs to crawl and index.

So, to some extent, there is the effect of splitting it up into separate URLs  or finding ways that you can generalize this content so that the general page makes sense for all users.

Speaker 2: Ok.

 

Closing Questions

John Mueller: Ok. I think we kind of out of time so I just open up to you guys.  I can have one or two more questions then we should be all good.

Speaker 6: I have a question John.

John Mueller: Ok.

Speaker 6: Why is it not possible to move subdirectory in the Webmaster tools only www.

John Mueller: You can do that with a 301 redirect. But, yeah.  So we use this feature primarily to recognize significant site moves — so if you’re moving from one domain to another and we need to kind of forward everything to that domain.  And that wouldn’t work so well from a technical point of view on our side for subdirectories.

But the problem I have with that is, of course, that this is an internal decision on our side on how we handle this information, and that shouldn’t  be something that the Webmaster has to worry about, how Google internally coordinates their data

So, I imagine at some point we will be able to improve that a bit so that either you have a way of more generally giving us information about site moves or we are just able to focus on the signals that you give us through redirect a rel=canonical and just say “OK, we can trust you on this, we can take your word and just process that directory.”

Speaker 6: How necessary is it to use the..?

John Mueller: It’s not necessary.  So, it gives us an additional signal in general, for site moves, if we start seeing a lot of 301 redirects, we’ll crawl all of it a little bit faster to just double check that it’s really the whole site that’s moving over.  And then we’ll process that.  So we’ll generally pick that up automatically as well.  We’ll pick it up maybe a little bit faster if you gave us that information at Webmaster tool as well.

Speaker 6: Alright. Thank you.

John Mueller: Sure. Alright one last question and who wants to grab it.

Speaker 1: I have a website that’s outdated and changed the entire layout later and put some useful content instead.  The site is 5 years old. Would that affect the site’s currents rankings in Google?”

John Mueller: Yes. It could, so any time you make significant changes on your site, with the layout, with the way that the pages are Interlinked, then that’s something that our algorithms have to learn first. So maybe you’ll see some fluctuations briefly, maybe you would even see some changes in the long run.

So taking an extreme example.  If you have a website that’s completely based on Flash. It’s one Flash File and it’s been like this for years, Google indexes it more or less. If you change it into something that’s a nice and clean HTML formatted site then you will probably be able to pick that up a lot easier and be able to crawl and index that easier, and probably be able to rank that better in the search results as well.

So even if it’s an old site if you do a revamp of the design or you do a revamp of the structure of the website, the way the pages are linked to each other, then that’s something that can, and generally does, have an effect on the ranking.

Speaker 1: Ok. Thank you very much.

Speaker 6: Would that mean changing the CSS files around?

John Mueller: It changes the CSS file, then that’s probably not something that we’ll recognize that quickly.  There’s one exception there in the sense that if you use the CSS file to make it more web-friendly, then of course, that’s something we could take into account. But if you’re just tweaking things and changing the font colors, changing the font sizes, then those are the kind of things we’d probably say “well these things happen all the time, we don’t necessarily need to take that into account for rankings.  Alright.

Speaker 6: On Mobile. I have one more…

John Mueller: Alright. Go ahead.

Speaker 6: If you have, for example, an old subdomain also with your normal site it’s not mobile friendly because you have them both sites, will you still be punished for having not a real mobile site or compatible on your desktop version?

John Mueller: So what will generally happen there is, in the best case, we’ll recognize that these sites are related.  For example, if you have the rel alternate link between those pages so that we recognize a mobile site belongs to the desktop site, we’ll be able to focus on the mobile site for the mobile search results.  If we can’t tell that they are related, we’ll treat them as separate sites.  So what could theoretically happen there, in a worst case, is that the desktop site just ranks a little bit lower and the mobile site ranks a little bit higher in the mobile search results, and you see changes like that. But is not-

Speaker 6: Does the desktop site show lower in a desktop search then…

John Mueller: No, no.

Speaker 6: It doesn’t affect desktop search, only mobile search?

John Mueller: It’s only for mobile. Yes. And at the moment we only take action on issues that are serious for mobile users.  So, if your desktop site is all flash and we recognize it’s not working for mobile, then that’s something we either flag in the search results or demote in the search results for smartphone users, not for the desktop users.  It’s just for smartphone users.

Speaker 6: And where does your smartphones stops and the tablets start?

John Mueller: Yeah. That’s always tricky right.  From our point of view we intend to treat tablets sometimes as desktop, sometimes as mobile phones.  I believe in search we will treat them more like smartphone because the capabilities are more like smartphones there.

So, for instance, Flash is something that’s rarely available on smartphones or tablets. There are also these type of faulty redirect so we find that websites use for smartphones and for tablets where you click at desktop URL and instead of taking you to  that desktop page or the equivalent mobile page, it redirects you to the homepage of the mobile website,  which is really frustrating.

Those are the kind of things that tend to be similar for smartphones and tablets so that’s why we treat them together. But it’s tricky because some tablets have higher screen resolution than my laptop, and it’s not always exactly clear which version you should be showing to which users..  So, that’s something more I imagine in the future, there’ll be a little bit more shuffling around happening and more refining of which elements goes where.

Speaker 6: Ok.

John Mueller: Alright. Thank you all for all of your questions and all of the feedback.  I’ll take a look at those URLs that you guys posted in the chat and see what we can do there,  if there is something the team needs to work on, and hope to see you guys in more of the future Hangouts.

Send a Message