Monday, February 22, 2010

#Mysteryfollower - part 1: version beta 1

My script that checks for returning followers on twitter is almost ready. Instead of getting the emails from twitter when I get a new follower, I have scheduled that I will get emails once a day with statistics on my followers. It will include follower information:
  • New
  • left me
  • returned

And I’ll see if I can include the key stats about the users too. Follower/following ratio, last message posted (and when) and the photo. What other information is relevant when you consider following back somebody who follows you?

I see my follower number is going up and down over time. Sometimes twitter removes spammers and boots making follower counts drop. But it will be interesting to get some more statistics. After having the script running for 8 hours (running every 60minutes) I see that I need to fine tune the SQLs in the database – as the log is filling up with new entries. If you want the geeky details, here is the setup:

I have decided to use asp 3.0 (the programming language I prefer) and a simple access database. I could have chosen a more robust database, but as long as I am the only user the database will only get connections from my script, and the number of entries will as many as the number of followers. If the follower number goes into the thousands – I will probably have more problems with the twitter API than the database anyway. I made a pre-beta version of the script some time ago, and when I picked up this project now twitter had changed their api. The biggest change is that they have included version in the URL, but also on pulling follower information have they changed from requesting pages to requesting with something they call cursor. The cursor is a database reference, and you’ll get the next 100 from a specific point in the database, and not 100 entries after your last 100 entries. This is a smart change, but I had to change the script a bit.

The twitter API returns about 100 followers per page, so I post the following URL starting with cursor equals -1.
api.twitter.com/1/statuses/followers/rygh.xml?cursor=-1

I include my credentials in the request to ensure I get also users who have protected profiles. This gives me the first entries, and it also returns the cursor value for the next 100 followers. The next URL will be the same, just replacing -1 with the new value. The last page will return the cursor value zero, and then I know there aren’t any more followers to be found. I merge the queries together and make it available as one XML-page. I parse stored XML file I get with the following asp code:
Sub atom(URL)
Set objXML = Server.CreateObject("msxml2.DOMDocument.3.0")
objXML.async = false
objXML.setProperty "ServerHTTPRequest", True
objXML.validateOnParse = true
objXML.preserveWhiteSpace = false

If Not objXML.Load(URL) Then
Response.write "ERROR"
Else
Set objNodeList = objXML.getElementsByTagName("user")
For Each objNode In objNodeList
For Each objNode2 In objNode.childNodes
Select Case objNode2.nodeName
Case "id"
id= objNode2.text
Case "screen_name"
name= objNode2.text
End Select
Next
'// HERE I include the followerscheck.asp-file
Next
End If
End sub

The included file (followercheck.asp) is where I add new followers to my database; those listed in the database that isn’t present in the XML will be marked in the database as “left me”. Whenever a new entry is posted, or updated with a different status a log file is created for the user. Here is why I will find the mystery followers that constantly follows and unfollows. The code I showed you only includes the fields ID and NAME; I can from the XML also pull the last status message, the photo, location etc. Everything you see on the web about a user is present in the entry in the XML.

The question is how frequently this script needs to be executed. I currently fine tune, so it runs on demand and every hour. But the question is if it is needed to be executed more than once a day. If somebody drops me to instantly follow me again, I will probably be without that follower for seconds – and making this script run so frequently is too time consuming. I actually see some performance issues with my current setup; so I will tune some more before I post the updated followercheck.asp script.

0 comments:

Post a Comment

Related Posts with Thumbnails